Single sign-on (SSO) has become something of a Holy Grail in many institutions to the point where many think that that is all identity management is. Anyone who's had to remember multiple user IDs and passwords just to use the email systems and file servers at work understands the pain that comes from having to manage multiple identity credentials.
Beyond causing pain for users, scattered identity data stores cause problems for the business as well. Integrating IT systems is important to businesses because of the added context that develops about a business activity when the data in multiple data stores can be linked. For example, linking the customer billing systems and the customer service systems gives employees processing invoices as well as employees providing customer service additional context about each customer.
For these reasons and more, aggregating identity information and finding the relationship between identity records is important. To aggregate identity data, organizations have four choices:
Build a single central identity data store.
Create a metadirectory that synchronizes data from other identity data stores in the enterprise.
Create a virtual directory that provides a single integrated view of the identity data stores in the enterprise.
Federate directories by tethering identity data stores together.
The first solution is included for completeness, but it's easy to see that creating a single data store of identity data is feasible only for small organizations. The goal of the latter three solutions is to present identity data as if a centralized identity data store exists, even when identity data is distributed across the organization. We'll discuss metadirectories and virtual directories here, but we'll save the discussion of federated identity for Chapter 12.
Metadirectories are collections of directory information from various, diverse directory sources. The information from those directory sources is aggregated to provide a single view of the data. As we've pointed out, the modern enterprise maintains information in dozens of directories of all sorts. Not all of it is information that should be aggregated, but much of it can be without having to re-implement all of the directories and the applications that depend on them.
In the http://Utah.gov story, I told at the beginning of the chapter, each agency maintained its own directory of its employees and their contact information. Because most of them were using Novell's eDirectory, and Novell had a good metadirectory product, we chose that route to creating an aggregated identity store.
Metadirectories allow the enterprise to collect information from existing directories into a single identity store that can be searched and queried as if all the information were stored in a single directory. Some of the benefits of metadirectory technology include:
A single point of reference provides an abstraction boundary between applications and the actual implementation so that as the organization changes directory vendors, modifies system implementations, or reorganizes data, the applications still query a single source.
A single point of administration reduces the burden of accessing multiple directories with multiple interfaces to maintain the data.
Redundant directory information can be eliminated, reducing the administrative load in managing duplicate data.
There are some significant challenges in building a metadirectory. These fall into two primary categories: governance and technical. Governance includes issues such as information ownership, interorganizational administrative responsibilities, namespace choices, data formats and schemas, legal requirements, and information security. The implementation issues, while by no means trivial, are relatively straightforward and include the architecture of the metadirectory, namespace normalization, protocols, and procedures for data synchronization.
Metadirectories work through software agents whose job is to gather defined subsets of directory information from the group of directories in the metadirectory's purview. The job may not be as easy as aggregating unconnected records. In fact, the most interesting uses of metadirectories involve the aggregation of attributes about a single subject from multiple directories to form a super record.
As an example of this, Figure 9-4 shows an example of how a metadirectory might be used to aggregate records from two directories. The metadirectory gathers data about the employee's jobs and status from the HR directory and information about the employee's payroll record from the Payroll department's directory, and aggregates the data into a record that gives a single view of the employee. Of course, the metadirectory may ignore some attributes from the HR and Payroll directories, because they aren't relevant to the objective of setting up the metadirectory.
Figure 9-4. Metadirectory aggregating data from HR and Payroll directories using different standards
In this example, we aggregated data from only two sources, but that needn't be the case. We could create a super record for the employee from as many data sources as we have available. An obvious extension to the example in Figure 9-4 would be to include phone information from the phone system and email address from the email system.
Creating such a super record presents the problem of knowing which records in the various directories to combine together. Metadirectories copy the information from the underlying directories into a local store, effectively doing a join of the data in the various directory records. The join is done using a pre-identified mapping between unique identifiers in the underlying stores, called the join point. Each directory aggregated by the metadirectory is connected to the metadirectory though a channel called a namespace connector that manages the link. Each connector can be individually configured to filter attributes as desired.
One of the most difficult tasks in creating a metadirectory is eliminating namespace conflicts between the various directories being aggregated and providing a mapping between the metadirectory namespace and the underlying namespaces. More abstract, but just as important, is the problem of mapping schemas from the underlying directories to the metadirectory so that attributes are meaningfully integrated.
Another significant problem is one of data synchronization. When implementing the metadirectory, choices must be made about where data can be modified. The data could be writable only at the directory level, only at the metadirectory level, or at both levels. Clearly which strategy is chosen depends on the application, but when data can be changed at the directory and metadirectory levels, synchronization becomes significantly more complex.
When data can be modified at the metadirectory level, complex authorization schemes may be needed to ensure that information owners can modify only their own data. In the example in Figure 9-4, only designated HR department representatives would be able to update those attributes that flow from the HR directory and similarly for Payroll representatives.
Because metadirectories can synchronize data bidirectionally, they can be used to populate one directory with attributes from another directory. They can even perform a data-cleansing function, allowing attributes to be validated and then pushed back down to the underlying identity store.
Virtual directories are similar in concept to metadirectories in that they create a single directory view from multiple independent directories. They differ in the means used to accomplish this goal. Whereas metadirectories typically use software agents to replicate and synchronize data from various directories in what might be thought of as batch processes, virtual directories create a single view of multiple directories using real-time queries based on mappings from fields in the virtual schema to fields in the physical schemas of the real directories.
Another way of looking at the difference is that metadirectories have an associated data store where the directory information from the other directories is kept; virtual directories have no separate data store. They work by turning a single query to the virtual directory into multiple queries to the physical directories and then aggregating the returned results in real time to create the result given to the user. Eliminating the identity store means that virtual directories also eliminate the problems related to data synchronization and replication.
In this way, virtual directories present a real-time interface to underlying identity data. Queries and updates happen in real time and are reflected to other applications using the data, either from the virtual directory or the underlying stores. Typically, the interface to the virtual directory is LDAP. Like a metadirectory, the connections from the virtual directory to the underlying stores could use any of a number of protocols. As a result, the virtual directory creates a standard view of the data using a standard API.
Virtual directories are used typically in cases where real-time access to frequently changing identity data is important.