Previous Table of Contents Next


In contrast to the pervasive syntax variation that occurred during the creation of B, the core semantic content of BCPL, its type structure, and expression evaluation rules remained intact. Both languages are typeless, or rather have a single data type, the word, or cell, a fixed–length bit pattern. Memory in these languages consists of a linear array of such cells, and the meaning of the contents of a cell depends on the operation applied. The + operator, for example, simply adds its operands using the machine’s integer add instruction, and the other arithmetic operations are equally unconscious of the actual meaning of their operands. Because memory is a linear array, it is possible to interpret the value in a cell as an index in this array, and BCPL supplies an operator for this purpose. In the original language, it was spelled rv, and later !, whereas B uses the unary *. Thus, if p is a cell containing the index of (or address of, or pointer to) another cell, *p refers to the contents of the pointed-to cell, either as a value in an expression or as the target of an assignment.

Because pointers in BCPL and B are merely integer indices in the memory array, arithmetic on them is meaningful: If p is the address of a cell, then p+1 is the address of the next cell. This convention is the basis for the semantics of arrays in both languages. When in BCPL, one writes

   let V = vec 10

or in B,

   auto V[10];

the effect is the same: A cell named V is allocated, then another group of 10 contiguous cells is set aside, and the memory index of the first of these is placed into V. By a general rule, in B, the expression

   *(V+i)

adds V and i and refers to the ith location after V. Both BCPL and B each add special notation to sweeten such array accesses; in B, an equivalent expression is

   V[i]

and in BCPL

   V!i

This approach to arrays was unusual even at the time; C would later assimilate it in an even less conventional way.

None of BCPL, B, or C supports character data strongly in the language; each treats strings much like vectors of integers and supplements general rules by a few conventions. In both BCPL and B, a string literal denotes the address of a static area initialized with the characters of the string, packed into cells. In BCPL, the first packed byte contains the number of characters in the string; in B, there is no count and strings are terminated by a special character, which B spelled *e. This change was made partially to avoid the limitation on the length of a string caused by holding the count in an 8- or 9-bit slot and partly because maintaining the count seemed, in our experience, less convenient than using a terminator.

Individual characters in a BCPL string were usually manipulated by spreading the string out into another array, one character per cell, and then repacking it later; B provided corresponding routines, but people more often used other library functions that accessed or replaced individual characters in a string.

2.2.3. More History

After the TMG version of B was working, Thompson rewrote B in itself (a bootstrapping step). During development, he continually struggled against memory limitations: Each language addition inflated the compiler so it could barely fit, but each rewrite taking advantage of the feature reduced its size. For example, B introduced generalized assignment operators, using x=+y to add y to x. The notation came from Algol 68 (van Wijngaarden, et al., 1975) via McIlroy, who had incorporated it into his version of TMG. (In B and early C, the operator was spelled =+ instead of +=; this mistake, repaired in 1976, was induced by a seductively easy way of handling the first form in B’s lexical analyzer.)

Thompson went a step further by inventing the ++ and –– operators, which increment or decrement; their prefix or postfix position determines whether the alteration occurs before or after noting the value of the operand. They were not in the earliest versions of B but appeared along the way. People often guess that they were created to use the autoincrement and autodecrement address modes provided by the DEC PDP-11 on which C and UNIX first became popular. This is historically impossible because there was no PDP–11 when B was developed. The PDP-7, however, did have a few autoincrement memory cells with the property that an indirect memory reference through them incremented the cell. This feature probably suggested such operators to Thompson; the generalization to make them both prefix and postfix was his own. Indeed, the autoincrement cells were not used directly in implementation of the operators, and a stronger motivation for the innovation was probably his observation that the translation of ++x was smaller than that of x=x+1.

The B compiler on the PDP-7 did not generate machine instructions, but instead threaded code (Bell, 1972), an interpretive scheme in which the compiler’s output consists of a sequence of addresses of code fragments that perform the elementary operations. The operations typically, but in particular for B, act on a simple stack machine.

On the PDP-7 UNIX system, only a few things were written in B except B itself because the machine was too small and too slow to do more than experiment; rewriting the operating system and the utilities wholly into B was too expensive a step to seem feasible. At some point, Thompson relieved the address–space crunch by offering a virtual B compiler that allowed the interpreted program to occupy more than 8 KB by paging the code and data within the interpreter, but it was too slow to be practical for the common utilities. Still, some utilities written in B appeared, including an early version of the variable–precision calculator dc familiar to UNIX users (McIlroy & Kernighan, 1979). The most ambitious enterprise I undertook was a genuine cross-compiler that translated B to GE-635 machine instructions, not threaded code. It was a small tour de force: a full B compiler, written in its own language and generating code for a 36-bit mainframe, that ran on an 18-bit machine with 4 KB words of user address space. This project was possible only because of the simplicity of the B language and its runtime system.

Although we entertained occasional thoughts about implementing one of the major languages of the time such as Fortran, PL/I, or Algol 68, such a project seemed hopelessly large for our resources: Much simpler and smaller tools were called for. All these languages influenced our work, but it was more fun to do things on our own.


Previous Table of Contents Next