Previous | Table of Contents | Next |
8.3.6.2. Order-Based Containers
Now lets look at library containers. The standard C++ library offers a number of containers, and related algorithms, that assume the ability to do order-based comparisons on the elements in the containers. One such container is set<T>, which has the property that an object of type set<T> contains objects of type T, and that every element of any particular set<T> object is unequal to every other element of that object.
Now consider objects of type set<int*>. It is certainly desirable to be able to have a set of integer pointers. It should not be hard to imagine a lot of uses for such sets. Yet without special pleading in the library, such a set works only if it contains only pointers to elements of the same array. The reason should not be hard to find: If a set<int*> object contains pointers to elements of two different arrays, the library evokes undefined behavior the moment it tries to compare two such pointers.
The problem is even worse than it appears at first because set<int*> objects work just fine on any machine that happens to implement pointer comparisons so that they work correctly on all pointers. The number of such machines is large and growing rapidly. How is it possible to define a container class that will work correctly across all machines?
Some people may claim there is no problem: Instead of comparing p<q, just compare (int)p<(int)q. That isnt really a solution, though, because there is no guarantee that it is possible to convert a pointer to an int (or even to a long), or, if it is, that the result of that conversion has sensible properties with respect to comparison.
Other people might claim that this example points out a fundamental limitation of order-based containers, and this problem would be entirely avoided if we had hash-based containers instead. But hash-based containers are useless unless the elements have a hash function defined, which means that container elements of user-defined type must have a user-defined hash function as well. In effect, this solution works by dumping the problem on the users shoulders. How does a user go about defining a portable hash function on pointers if conversion from pointer to integer is not always defined under every implementation?
8.3.6.3. A Likely Solution
The key to the solution is to realize that, although different machine architectures might offer different ways of comparing pointers, every machine must make it possible somehow. After all, it is already guaranteed that two pointers are equal if and only if they point to the same object (or are both zero), which already rules out assigning the same address (whatever an address is on the machine) to two different objects. All we need, then, is a way to name the correct comparison between pointers, along with a requirement that the library make that kind of comparison available with the same name for all implementations.
The C++ standard library does exactly that and makes the comparisons available through function object templates. To compare two integer pointers, we use a function object of type less<int*> so that even if p<q is not guaranteed to work correctly for arbitrary p and q, less<int*>()(p,q) provides an appropriate guarantee. Here, the first pair of parentheses constructs an object of type less<int*>, and the second pair calls that object with arguments p and q. Of course, the call is actually an invocation of operator() on the object, but from the users viewpoint, it looks like a call.
The guarantee that less<int*>() offers is that whenever p<q is guaranteed to be well defined, less<int*>()(p,q) has the same value. Moreover, less<int*>()(p,q) is a total order relation over all possible values of p and q.
With those definitions and guarantee, it should be obvious that the right way to define set<T> is to use less<T> as the normal comparison operation for the container, rather than just to use the < operation. Then all that is necessary is for less<T> to use < when less<T> is not overridden by a specialized definition for particular types (such as pointer types).
For example, an implementation in which < works for all pointers might define a template along the following lines:
template<class T> class less: public binary_function<T, T, bool> { public: bool operator()(T x, T y) { return x < y; } };
In this example, the binary_function base class is there to make our less class conform to library conventions that are irrelevant to this chapter. What is important is the definition of operator(), which simply defines class less so that calling an object of type less<T> with two arguments of type T compares the arguments.
Now, suppose we are on an implementation where < does not work with pointers to objects in different arrays, but on which it is possible to simulate such comparisons by casting the pointers to long. On such an implementation, the library must define a partial specialization of the less template, along the following lines:
template<classT> class less<T*>: public binary_function<T*, T*, bool> { public: bool operator()(T* x, T* y) { return (long) x < (long)y; } };
This specialization defines the less template for all pointer types. The net effect of the specialization is to say that less<T>(x,y) means (long)x<(long)y whenever T is a pointer type and x<y otherwise.
It is important to realize that these implementation-dependent definitions are part of the library for each individual machine, rather than something that users generally have to consider on their own initiative. Ultimately, the user will apply a library function, such as sort, to an array, and it will work correctly whether that array contains pointers, integers, or objects of some other type.
Previous | Table of Contents | Next |