Chapter 16. Identity Data Architectures

As part of the State of Utah's enterprise architecture efforts, we completed an inventory of data repositories in the state in early 2002. There were over 250 databases that contained information about individuals and over 175 that contained records about businesses, and this didn't include spreadsheets, Access databases, and other minor repositories. The problem is that we couldn't really tell which records in one database were related to the records in another.

Many people are happy to hear that their government can't link database records, but it hampers many of the electronic government services that citizens would like to see. For example, there are over 20 different databases that keep track of information about health information for children in Utah. The end result is that when a mother brings her newborn in for immunization, the receptionist can't say, "I see that Johnny hasn't had his hearing and vision checked. Would you like me to schedule an appointment?" That sort of simple customer relationship management can't happen when you keep information about your customers in 250 different places.

As you think about the data in your organization, you'll probably find this story resonating with your experiences. Most organizations have volumes of data that has simply grown and expanded over time in a largely unguided fashion. While we like to think of databases as large repositories of multi-purposed information, most databases are simply persistent data storage for a single application.

Data is important in an identity management architecture (IMA), because identities are usually stored on computers as digital records of some kind. This chapter is about building data architectures for identity data. A data architecture links data to specific business goals and processes, categorizes it, identifies metadata, and defines important details about how it is represented.

Building a data architecture for identities requires that we consider three different concepts: categorizing identity data, exchanging identity data, and structuring identity data. Figure 16-1 shows these three components along with some of the issues we'll address for each.


As we discussed in Chapter 2, a digital identity is a record that contains one or more names as well as attributes, preferences, and traits of some person or thing. For our purposes in this chapter, we'll restrict that definition to records that contain some unique identifier. Moreover, we'll just refer to the preferences, traits, and attributes as "properties."

These identities might refer to people, applications, manufactured goods, or other things that the organization cares about. If I asked you to list the identities in your organization, you might include only records that identify people such as employee and customer records. You might miss billing records and might not even think of manufacturing data as a kind of identity data.

We commonly think of "identity management" being about authenticating and authorizing people to take certain actions, but for the purposes of this chapter, we need to expand this definition.

We frequently hear the comment that an organization's data is one of its most valuable assets. I think that's probably true, but the fact is that data projects never go anywhere. There's never money for a project to clean up the data and create enterprise data repositories. What do businesses care about? Processes, because processes achieve business results. Let me relate a story that illustrates this.

Utah, a relatively small state, manages over $1 billion per year in federal government welfare benefits. Each state is responsible for ensuring that welfare recipients are eligible for the benefits that that state distributes. The majority of these benefits are distributed in four major programs that have different eligibility requirements. The job is made more interesting by eligibility interactions—your eligibility for one program could be affected by benefits from another.

Utah, like most other states, had four different computer systems that automated at least part of the eligibility determination process. The lack of a common identity across these four systems was costing the state real money every year in lost time and improperly paid benefits. Utah had tried years before to create a master identifier and then use that identifier in the four separate data repositories so that records could be linked. That had limited success, largely because it relied on the ability of benefits coordinators to link the individuals across four separate systems as they submitted applications.

The IT departments in the three agencies that administer these four programs could never make the case to business leaders to invest in a common identifier, even though there was real need. What ultimately solved the problem was a proposal to revamp the processes that these agencies used to determine eligibility. The new processes provided more decision support to eligibility agents, making their jobs easier and less subjective. That's something the business could understand. Utah decided to build a new eligibility system at the cost of tens of millions of dollars that served all four programs. Of course, the new system uses a single repository for client data, solving the problems with multiple identities.

When we build data architectures, we're doing so in support of business processes. Business leaders in Utah's health and human service agencies could relate to the process problems that their employees faced and saw real advantage (millions of dollars worth) in updating systems that served those processes, so they jumped on board. Process improvement drove the data architecture improvement.

The last chapter talked about using an identity management maturity model to improve processes. Our approach to data architectures is to drive identity data integrity by incrementally making the use of identities more consistent in business process. We'll use the process inventory as a starting point and process improvement as a driver for changes in the data architecture.