Previous | Table of Contents | Next |
By 1970, the UNIX project had shown enough promise that we were able to acquire the new DEC PDP-11. The processor was among the first of its line delivered by DEC, and three months passed before its disk arrived. Making B programs run on it using the threaded technique required only writing the code fragments for the operators and a simple assembler that I coded in B; soon, dc became the first interesting program to be tested, before any operating system, on our PDP-11. Almost as rapidly, still waiting for the disk, Thompson recoded the UNIX kernel and some basic commands in PDP-11 assembly language. Of the 24 KB of memory on the machine, the earliest PDP-11 UNIX system used 12 KB for the operating system, a tiny space for user programs, and the remainder as a RAM disk. This version was only for testing, not for real work; the machine marked time by enumerating closed knights tours on chess boards of various sizes. Once its disk appeared, we quickly migrated to it after transliterating assemblylanguage commands to the PDP-11 dialect and porting those already in B.
By 1971, our miniature computer center was beginning to have users. We all wanted to create interesting software more easily. Using assembler was dreary enough that B, despite its performance problems, had been supplemented by a small library of useful service routines and was being used for more and more new programs. Among the more notable results of this period was Steve Johnsons first version of the Yacc parsergenerator (Johnson, 1979a).
The machines on which we first used BCPL and then B were word-addressed, and these languages single data type, the cell, comfortably equated with the hardware machine word. The advent of the PDP11 exposed several inadequacies of Bs semantic model. First, its character-handling mechanisms, inherited with few changes from BCPL, were clumsy: Using library procedures to spread packed strings into individual cells and then repack, or to access and replace individual characters, began to feel awkward, even silly, on a byte-oriented machine.
Second, although the original PDP-11 did not provide for floatingpoint arithmetic, the manufacturer promised that it would soon be available. Floating-point operations had been added to BCPL in our Multics and GCOS compilers by defining special operators, but the mechanism was possible only because on the relevant machines, a single word was large enough to contain a floatingpoint number; this was not true on the 16-bit PDP-11.
Finally, the B and BCPL model implied overhead in dealing with pointers: The language rules, by defining a pointer as an index in an array of words, forced pointers to be represented as word indices. Each pointer reference generated a runtime scale conversion from the pointer to the byte address expected by the hardware.
For all these reasons, it seemed that a typing scheme was necessary to cope with characters and byte addressing and to prepare for the coming floating-point hardware. Other issues, particularly type safety and interface checking, did not seem as important then as they became later.
Aside from the problems with the language itself, the B compilers threaded-code technique yielded programs so much slower than their assembly-language counterparts that we discounted the possibility of recoding the operating system or its central utilities in B.
In 1971, I began to extend the B language by adding a character type and also rewrote its compiler to generate PDP-11 machine instructions instead of threaded code. Thus, the transition from B to C was contemporaneous with the creation of a compiler capable of producing programs fast and small enough to compete with assembly language. I called the slightly extended language NB for new B.
NB existed so briefly that no full description of it was written. It supplied the types int and char, arrays of them, and pointers to them, declared in a style typified by
int i, j; char c, d; int iarray[10]; int ipointer[]; char carray[10]; char cpointer[];
The semantics of arrays remained exactly as in B and BCPL: The declarations of iarray and carray create cells dynamically initialized with a value pointing to the first of a sequence of 10 integers and characters. The declarations for ipointer and cpointer omit the size to assert that no storage should be allocated automatically. Within procedures, the languages interpretation of the pointers was identical to that of the array variables: A pointer declaration created a cell differing from an array declaration only in that the programmer was expected to assign a referent, instead of letting the compiler allocate the space and initialize the cell.
Values stored in the cells bound to array and pointer names were the machine addresses, measured in bytes, of the corresponding storage area. Therefore, indirection through a pointer implied no runtime overhead to scale the pointer from word to byte offset. On the other hand, the machine code for array subscripting and pointer arithmetic now depended on the type of the array or the pointer: To compute iarray[i] or ipointer+i implied scaling the addend i by the size of the object referred to.
These semantics represented an easy transition from B, and I experimented with them for some months. Problems became evident when I tried to extend the type notation, especially to add structured (record) types. Structures, it seemed, should map in an intuitive way onto memory in the machine, but in a structure containing an array, there was no good place to stash the pointer containing the base of the array, nor any convenient way to arrange that it be initialized. For example, the directory entries of early UNIX systems might be described in C as
struct { int inumber; char name[14]; };
I wanted the structure not merely to characterize an abstract object but also to describe a collection of bits that might be read from a directory. Where could the compiler hide the pointer to name that the semantics demanded? Even if structures were thought of more abstractly, and the space for pointers could be hidden somehow, how could I handle the technical problem of properly initializing these pointers when allocating a complicated object, perhaps one that specified structures containing arrays containing structures to arbitrary depth?
Previous | Table of Contents | Next |