Naming is one of the fundamental abstractions for dealing with complexity. Names provide convenient handles for large, complex things—allowing them to be manipulated and referenced by some short, easy-to-remember string, instead of a longer, unwieldy representation. Filenames, for example, let us pick a meaningful handle for what ultimately is a collection of bits located on a particular set of sectors on a particular set of tracks on a particular set of disks.
Names also disassociate the reference from the thing itself, so that the underlying representation can change without changing the name. Perhaps the most familiar example of this is domain names. The domain name http://windley.com points to some IP address. If I decide to change the machine hosting the services at http://windley.com to another one with a different IP address, it's easily done and everyone referring to the name will still end up with the services that they are looking for.
A namespace is the universe within which a name is guaranteed to be unique and defines where the name has meaning. For this reason, namespaces are sometimes called "domains ." A family name (usually) acts as a namespace wherein given names are unique and meaningful. In an email address, the name (the part before the @ symbol) is guaranteed to be unique within the namespace (the part after the @ symbol). Filenames are unique within the namespace of the directory in which they reside.
Namespaces can be flat or hierarchical . The usernames on a standalone computer are an example of a flat namespace. A filesystem is the most familiar example of a hierarchical namespace. Domain names are another familiar example of a hierarchical namespace. Figure 9-1 shows how hierarchical namespaces work in domain names and filesystems.
Hierarchical namespaces have some interesting properties:
A path inside the hierarchy between the root node and a leaf node can be used to specify any entry in a hierarchical namespace.
Some paths are referenced and written from root to leaf (e.g., filesystems) and some are referenced and written from leaf to root (e.g., domain names).
In some hierarchical namespaces, like domain names, names can be both nodes and leaves. For example, I can reference both http://windley.com and http://www.windley.com with windley serving as a node in one case and a leaf in the other.
In other hierarchical namespaces, like filesystems, leaves and nodes are strictly differentiated.
In many hierarchical namespaces, the hierarchy reflects some actual hierarchy in the physical world. Usually, however the hierarchy in the namespace and the organization of the objects represented by the hierarchy do not have a one-to-one correspondence. For example, a filesystem is a hierarchy that exists entirely independent of the location of the bits on the disk and is strictly for the convenience of the user. With domain names, the hierarchy sometimes mirrors the physical world, but not always. There really is an organization called Yahoo! that owns http://yahoo.com. On the other hand, http://ftp://ftp.windley.com, http://www.windley.com, and http://mail.windley.com are all the same machine.
As I write this, I've just ordered a new laptop from Apple for my wife for Christmas. One of the items in the confirmation email I received from Apple was a URL to the package-tracking page at FedEx. Have you ever thought of this package-tracking page as the homepage for that package on the Internet? Every package shipped via FedEx, UPS, and most other companies has a homepage that is named by the URI (Uniform Resource Indicator) that is used to reference it. The URI identifies a unique location on the Web, and that URI can be linked in another document or bookmarked for later reference—just like any other web page. The package "homepage" is no different that any other homepage on the Internet in that regard.
URIs are more general versions of URLs, or Uniform Resource Locators, the "web page address" that you type in the address box on your browser. Whereas URLs represent locations and, as such, typically correspond to real resources on the Internet, URIs can be used to name things within a single, global namespace even when there's no web location associated with the name. The structure is the same, however, and so many URIs also function as URLs.
URIs are one of the most important features of the Web. Without URIs, much of what we take for granted on the Web wouldn't work. As a simple example, having a universal namespace created using URIs allows any document, anywhere on the Web, to refer to any other document, anywhere on the Web, without the authors of the two documents having to agree on the same software package or server, beyond what's inherent in the Web itself. In fact, Paul Prescod has said: "If there is one thing that distinguishes the Web as a hypertext system from the systems that preceded it, it is the Web's adoption of a single, global, unified namespace."[*]
Apart from their use to identify resources on the Web, however, URIs are finding their way into many other contexts, because the URI system represents a universal namespace. Giving off-web resources, such as database records, a URI makes them part of this same universal namespace and ensures that they can be uniquely distinguished from other resources.
URLs and URIs have three major components:
A protocol identifier followed by a colon (e.g., http:).
A domain name indicating a unique computing domain on the Internet (e.g., http://www.windley.com).
A path component indicating what specific resource in that domain is to be identified (e.g., /llp?ln=windley&lang=en).
Taken together, these components are written in the familiar fashion:
There can be other components as well, including authentication information, port numbers, etc., but these three are the most common.
The URI is the public interface to a resource and, consequently, deserves great thought. One of the key factors that should be kept in mind when designing URIs is that they should not change—ever. This is not such a radical idea if you stop to consider that the URI is the name of the resource. In general, it's a bad idea to change the name of something, because we cannot possibly know all the places where the name is being used and, consequently, have no way of notifying those places when the name (the URI) changes . Thus, the URI should be chosen so that it is meaningful and unlikely to change. As the system is updated and maintained, the non-volatility of the URIs should be preserved. Numerous tools and techniques exist to make this possible. URL rewriting is one of the most powerful, allowing servers to resolve URI references to almost any resource.
Designing the URIs for your information system should be one of the most important tasks of the design phase. It may seem unusual to think of designing URIs. After all, don't we just let the network folks tell us our domain name and let the path fall out however it may? Not in a well-designed system. The last section talked about the three components that are typically part of a URL. All three are usually under our control and should be carefully chosen.
Don't construe this principle to mean that all resources need to be permanent. Just because URIs don't change doesn't mean that the resource has to be always available. There are some resources that are transitory and some that go out of existence. Even so, we shouldn't change their name.
[*] Prescod, Paul. "Roots of the REST/SOAP Debate," http://www.prescod.net/rest/rest_vs_soap_overview/.