Previous | Table of Contents | Next |
The top-level syntax of a C program consists of a series of global definitions and declarations, of functions, global variables, and data types. For the most part, the syntax is free-form; whitespace and line breaks are ignored except as they serve to separate adjacent tokens. Comments serve as whitespace.
The most significant exception to the free-form rule concerns preprocessor directives. These all begin with the character #, and must stand alone on a line. Both sample programs above contain one example of a preprocessor directive: the line #include <stdio.h>.
(On those rare occasions when it is necessary to continue a logical source line across multiple physical lines, the backslash (\) can be used as a continuation character at the ends of physical lines. Such continuation is generally only needed in preprocessor #define lines; see section 3.8.2.)
We now proceed with detailed explanations of the four fundamental syntactic elements of C programs: declarations (including types and constants), expressions, statements, and functions. Section 3.6 discusses the special topic of pointers, and section 3.7 describes user-defined data structures. Section 3.8 covers the C preprocessor. Section 3.9 introduces the C runtime environment. Finally, section 3.10 lists the functions in Cs standard library.
C is a strongly typed language. With a few exceptions (which modern practice discourages making use of), all variables and functions must be declared before use. The reasons for predeclarations of variables and functions are the usual ones: to make life easier on the compiler, to help catch programmer errors (e.g., misspelled names), and to encourage a thoughtful selection of the appropriate type for each variable.
It is useful to think of a type in a somewhat mathematical sense: as a (finite) set of values upon which certain operations may be performed. C has a handful of basic types and several open-ended mechanisms for creating derived types. We cover the basic types first.
The most common types in typical C programs are certainly the integers. Cs basic int type holds integers in at least the range ±32,767; type int is typically implemented as a machine word of a natural size, so it is usually the most efficient type in terms of calculation speed and memory bus cycles.
The basic int type can be extended in two orthogonal ways. Firstly, the size modifiers short and long request integers that are potentially smaller and larger than plain int. The short int type has a range of at least ±32,767, which is the same as plain int; the distinction is that short int is likely to be smaller (and so to occupy less memory in arrays or other large data structures) on machines where plain int is larger than its guaranteed minimum range. The type that is guaranteed to be larger is long int, which has a guaranteed minimum range of ±2,147,483,647 (or 2311).
From these guaranteed minimum ranges, we can see that types int and short int are both at least 16 bits in size, whereas long int is at least 32 bits. The exact sizes of these types vary from machine to machine and from compiler to compiler, and it is generally difficult to declare a variable with an exact size. It is wise, therefore, not to be too concerned about the exact size of a particular data type, as this is an implementation detail that can be thought of as best left to the compiler. Similarly, it is implementation-defined how the individual bytes of integer values are arranged in memory, and how negative numbers are represented (e.g., in twos complement or some other representation).
The second mode of extension is that unsigned versions of all of the integer types exist, indicated by the keyword unsigned. The unsigned types obviously hold only positive integers but are of the same size as their signed counterparts. The minimum range of types unsigned int and unsigned short int is 065,535, and of unsigned long int is 04,294,967,295 (which is 2321). It is also guaranteed that if arithmetic on an unsigned value overflows, or if an attempt is made to store a negative or overlarge value into an unsigned variable, the result is computed modulo the range of the corresponding type, without incurring any traps or exceptions. (Overflow on signed types, on the other hand, yields undefined behavior.) Thus, on a machine for which the range of unsigned int is 065,535, all arithmetic yielding an unsigned int result is computed modulo 65,536.
For symmetry, the keyword signed can be used to explicitly indicate a signed integer, although because this is the default for integers, the keyword is infrequently needed. (It can be useful with characters, as mentioned below, and with bit-fields, as discussed in section 3.7.4.)
When one of the modifiers short, long, signed, or unsigned is used, the keyword int is optional. Thus, the types can also be referred to as short, long, unsigned, unsigned short, unsigned long, and so on. (These shorthand forms are considerably more popular, and there is no stigma attached to their use.)
Under some compilers, a doubled modifier long long indicates an extra-long type, essentially of at least 64 bits, and this extension is likely to be incorporated into the next revision of the C Standard.
Previous | Table of Contents | Next |