Previous | Table of Contents | Next |
In traditional programming languages, the data types reflect the design of the underlying computer hardware. For example, in C and C++, the integer data type can represent a range of values that varies depending on the computer model. In a 16-bit computer, the range of integers is plus or minus about 33,000; in a 32-bit computer, the range is plus or minus about 2 billion. The designers of Java are proud that their language specification requires the use of 32-bit integers regardless of the underlying hardware. In most implementations of Scheme, there is effectively no limit to the size of an integer; you can compute an exact value for 1000 factorial. (This bignum feature is recommended but not required in the Scheme standard.)
Many Scheme implementations also support numeric types that are not part of the underlying computer instruction set: exact rational numbers and complex numbers. In all Scheme implementations, the numeric types are organized in a way that reflects mathematical truth rather than hardware details: An integer is also a real number. (To accommodate the realities of computer representation, any Scheme number, of whatever type, can be exact or inexact.)
In most programming languages, the preferred mechanism for data aggregation (creating a data structure with several component parts) is the array, a fixed-size block of contiguous computer memory. Several famous computer security holes, as well as ordinary program bugs, have resulted from the deliberate or accidental overrun of an arraytrying to store more elements in the array than the programmer allocated.
Although Scheme does provide arrays, the preferred data aggregation mechanism is the list, a dynamically allocated structure that can expand or contract during the running of a program. The programmer need not know anything about the actual layout in memory of the list elements; the language implementation handles that automatically. The elements of a list may themselves be lists, so tree-structured data require no special planning.
In most languages, non-numeric information is handled through the use of a character data type. An English sentence is represented as a string of characters, in which letters, spaces, and punctuation are all encoded as elements of the string. In Scheme, there is a symbol data type that represents a word as an atomic unit. An English sentence is naturally represented in Scheme as a list of symbols so that a program can examine the sentence word by word without having to scan through a string for space characters. (Characters and strings are, of course, also available in Scheme for situations in which they are the most natural representation.)
In most languages, each variable may hold only values of a particular type, which the programmer must declare in advance. For example, for any numeric value used in a program, the programmer must decide in advance whether the value will always be an integer or may take non-integer values. The requirement to declare variables was initially used to make languages easier to compile; an expression such as x+y must be compiled into one of two different machine language instructions depending on whether fixed-point (integer) or floating-point (real) arithmetic is needed.
Today, advocates of strongly typed languages argue that type declarations help prevent program bugs and that this justification is more important than the one about helping the compiler. Yet, in most languages, the available type declarations still follow the computer hardware categories. Why is the distinction between integer and non-integer values more valuable as a debugging aid than, for example, the distinction between positive and negative values? If the type declarations are meant to prevent bugs, then the programmer should be able to use any arbitrary predicate function as the type. Compiler efficiency is still, in most languages, the unspoken reason driving the design of the type system. (There are exceptions, such as the language ML, in which typing is both more general and more automatic, without the need for explicit declarations, than in traditionally typed languages such as C++ and Java.)
One cost of strong typing is that its difficult, in many languages, to build a data structure whose elements can be mixed types. For example, most languages allow the programmer to construct an array of integers or an array of reals, but not (without extra effort) an array of some of each. Even more difficult is constructing a hierarchical structure in which an element can be another aggregate, such as this Scheme list:
(4 8.3 hello (1 1 2 3 5 8) (3.14159 2 4 6) 7)
This six-element list includes two integers, a non-integer real number, a character string, and two sublists as its elements.
Many strongly typed languages do provide escape mechanisms so that, with a lot of effort, a programmer can mix data types. The object- oriented programming paradigm has an undeserved reputation as being complicated; most of the complexity of languages such as C++ and Java has nothing to do with their object orientation but comes instead from the type declarations and the mechanisms to work around them. This is a prime example of how Schemes approach of removing restrictions compares with the piling feature on top of feature needed in other languages, such as the C++ template mechanism.
Previous | Table of Contents | Next |