Previous | Table of Contents | Next |
8.4.1.3. Technicalities
This technique has the disadvantage of relying on a property of C that the language definition does not guarantee.
Consider this example:
struct string1 *p1; struct string2 *p2; if (p1->length > 3) p1->data[3] = ?; if (p2->length > 3) p2->data[3] = ?
We assign a new value to the data[3] element of each of our strings only if the length is big enough to accommodate it. Nothing is wrong there, but look at the assignment to p2->data[3]. Here, p2->data is an array with only one element. How can it be legal to assign to element number 3?
On the other hand, the way we are allocating memory for our string2 objects guarantees that there will actually be usable memory at the place where p2->data[3] would have to be if it existed at all. So what is the harm in using it?
I put that question to the C standards committee, pointing out that the answer would determine whether a debugging C implementation would be allowed to reject an assignment to p2->data[3] on the basis of bounds checking. I found out later that my question provoked the longest debate on any such issue that had ever occurred in the C committee. The eventual answer was that the use of p2->data[3] was indeed illegal, whether memory was effectively accessible or not. Of course, implementations would not be required to check for such errors. Because I agreed with the conclusion, I did not inquire about the details. Despite its popularity, the string2 technique is not valid C; programmers who want to write maximally portable code should avoid it even though it is likely to work on most compilers.
8.4.1.4. The Rigorous Nature of C++
So far, we have been talking only about C. It turns out that in C++, there is a much stronger reason to avoid the string2 technique: It will fail horribly in many C++ implementations if the class later acquires a virtual function.
Most C++ programmers know that if a class has a virtual function, the usual implementation technique is for the compiler to store an extra pointer in each object of that class. That pointer contains the address of a table that describes the type of the object and points to all that classs virtual functions.
What is less widely known is that this extra pointer is often stored at the end of the object, not the beginning. Part of the reason for this strategy is that it makes it easier to interchange data between C and C++ programs if the C part of a structure is at the beginning; it then requires no extra step to convert the value of a pointer when passing it between C and C++ programs.
For instance, if we had a structure such as
struct string3 { virtual void foo(); int length; char data[1]; };
it would be entirely possible that the compiler might decide to store the virtual table pointer immediately after the data member, as if the structure had been declared this way:
struct string3a { int length; char data[1]; virtual_table *vptr; };
It is clear that trying to use data[n], where n is greater than 1, would eventually lead to disaster.
8.4.1.5. Array Bounds Summary
Formally speaking, array bounds are intended to be taken seriously in both C and C++. C programmers often fail to do so, particularly in ways similar to those shown here. Such transgressions usually go unpunished in practicebut only in C, not in C++. In particular, the presence of virtual functions effectively guarantees that this kind of memory cheating does not work in C++.
The way to do this kind of thing in C++ is to use a library, if an appropriate one is available. If it isnt, then resist the temptation to cheat. Instead, put the memory allocation and deallocation into a constructor and destructor, get it right once, and then forget about it. The resulting class is easier to use than its C counterpart, even if the C version cheats.
Previous | Table of Contents | Next |