Previous | Table of Contents | Next |
When C is used for low-level, down to the bare metal programming, it is sometimes useful to consider pointers not as abstract reference types but as the machine addresses that they typically are. Although the relationship between pointers and memory addresses and the details of low-level memory addressing are inherently machine- and compiler-dependent, studying them does provide a more concrete model for how pointers work. (Readers uninterested in those details, or who prefer to stick exclusively to portable programming and higher-level abstractions, may skip the rest of this section.)
The internal representation of a C pointer is typically a machine word, of a size appropriate for the machines addressing architecture (e.g., 16-bit or 32-bit words, for machines with 16- or 32-bit addressing structures). If the global variable g sits at address 0x1234, then the assignment
p = &g;
(where p is a pointer of the appropriate type) usually stores the address 0x1234 in p. (It is for this reason that & is often referred to as the address of operator.)
Pointer arithmetic is straightforward, with one twist. Continuing the preceding example, if g is type int and p is type pointer-to-int, then the expression p + 1 should not result in 0x1235 on a byte-addressable machine, but rather 0x1236 or 0x1238, depending on whether type int is two or four bytes (16 or 32 bits). Therefore, on a byte-addressable machine, the actual interpretation of an expression like p + n at the address level is p + n * sizeof(*p). But it is rarely necessary to worry about the multiplication by sizeof(*p), because under virtually all circumstances, it is taken care of automatically by the compiler. In particular, explicit sizeofs in pointer arithmetic expressions often indicate that a programmer is inappropriately bogged down in assembler-style thinking, and the extra sizeofs are actively wrong if they repeat the scaling that the compiler would do. (In other words, do write p + n in C to compute a new pointer n objects past the one p points to; dont write something like p + n * sizeof(int).)
Nothing in the C language mandates byte addressability, however. It is not at all unthinkable, for example, to implement C on a word-addressed machine (and several such implementations have been successfully undertaken). On a word-addressed machine, an expression like p + 1 would likely add just 1 to ps internal value (assuming p were declared to point to a word-sized data type). However, on such a machine, types int * and char * would themselves have different sizes because char * would need a few more bits to select an individual byte within a word. Moreover, on these machines, conversions between different pointer types (e.g., as requested by pointer casts) are actual conversions; they do much more than just satisfy abstract requirements about type correctness.
Even on machines with flat, byte-addressed memory architectures, alignment restrictions may be significant. Conversion of, say, an arbitrary char * to an int * may lead to access violations when the int * is used. How, then, does malloc ensure that its callers will not get access violations when they use the pointers it returns to manipulate objects larger than bytes? The answer is that malloc is written so as to return pointer values suitably aligned for conversion to any type.
Finally, what about null pointers? As far as C is concerned, the only requirement is that the bit pattern that a compiler implementor chooses for a null pointer be one that can never be the address of any actual object or function. It may be desirable to pick an address that causes a memory violation when accessed, to catch programs that inadvertently attempt to use null pointers incorrectly. In fact, address 0 is commonly chosen, and attempts to access this address do indeed cause access violations under many memory-management arrangements. However, this choice is in one way a poor one, as it inevitably leads to a great deal of confusion because of the unavoidable association between the address 0 and the constant 0, which is used to request null pointers in C source code. In point of fact, it is essentially a coincidence that both the source-code and internal representations of null pointers involve the number zero; there is no reason why internal null pointers could not be represented by, say, the bit pattern 06000. (This is not a hypothetical example; there were some old mainframes that used just this bit pattern.) A nonzero internal null pointer representation causes no problems for correctly written C code, though, because the compiler is responsible for generating that bit pattern whenever the programmer uses a constant 0 in a pointer context in source code (including when the zero appears implicitly in constructs such as if(p), or is hidden behind the NULL macro).
When writing embedded code or device drivers, or accessing memory-mapped peripherals or display memory, it is often necessary to refer to an absolute memory address. Such access is relatively straightforward in C because it is possible to use an explicit cast to convert an integer into a pointer. For example, the code
char *p = (char *)0xb8000000; *p = X;
would probably store the character X at address 0xb8000000. The details of a particular compilers integer to pointer conversion are highly machine dependent (especially under segmented memory architectures), but because code that makes use of these conversions is also machine dependent, the machine-dependent techniques are not inappropriate.
C supports three user-defined data structures, along the lines of those found in many languages. Cs struct type is a record of related information. A union is a variant record that can hold, at any one time, one value chosen from a selected set of types. Finally, an enum (enumeration) is an integral type with predefined symbolic names for its values.
Previous | Table of Contents | Next |