Identity Data Structure and Metadata

The inventory has created a database of the identity data in use, and the audit has gathered a lot of information about each data source. One of our goals is to establish an authoritative source for each identity. One of the barriers is inconsistent usage and content for identities that should be the same. There are five basic ways that identities can be inconsistent across different data sources:

  • Inconsistent use of identifiers for the same data. Your organization might have several data records that are used to store customer identities. The identifier in one might be the database record number, another might use the SSN, and a third might use a company-assigned customer number.

  • Inconsistent values for the same data in a field. The customer's name might be stored as a single value in one identity record, split into first and last name in a second, and a third might store middle initial and suffix.

  • Inconsistent names for fields that carry the same data. One identity record may store the customer's last name in a field called lname and another might call it lastname. These are called synonym fields.

  • Inconsistent meaning for field names. Two identity records might use the field name phone, but one might use it to store the phone number of the customer and the other might use it as a flag to indicate that the salesperson should make a follow-up phone call to the customer. These are called homonym fields.

  • Inconsistent representation. One customer identity record might store the customer number as a number and another might store it as a series of characters.

As you identify elements of identity records that have the same business purpose, you have the opportunity solve these inconsistencies. In most organizations, this will require meetings between the respective owners and custodians of the identity records to hash out a common format for the data elements and choose an authoritative source. The owner of the authoritative source becomes the owner of this data element.

Authoritative metadata records are stored in a metadata repository. As the data elements of identity records are made consistent, metadata records for those elements should be entered in the metadata repository. New projects should be encouraged to use the metadata repository to find identity information they need. Projects should also submit new data element definitions to the metadata repository as they are defined so that other projects can use existing data element definitions instead of creating new inconsistencies that have to be ironed out later.

Even if all of the various data sources can't be made consistent right now, it's useful to create the metadata record to store the common format as a goal state so that existing projects can refer to it as legacy systems are maintained and updated.

Table 16-2 shows the metadata fields, describes their purpose, and gives examples of what might be stored in each field for a Social Security number (SSN) element in a hypothetical organization.

Table 16-2. Metadata for identity data elements

Field

Purpose

SSN example

Data element name

The field name for this data element

SSN

Element definition

The human definition of this data element

A nine-digit number used by the Social Security Administration and IRS to identify individuals

Owner

The individual (usually by role) who owns this data element

HR manager

Scope

The geographic or organizational scope where this data element has meaning

U.S.

Business format

The standard business requirement for the format of this data element

999-99-9999

Business length

The length of the standard business format

11 characters

Business type

The common way that business thinks of this data element

Number

Usage and audit requirements

List any business restrictions on using or requirements for auditing this data such as privacy policy, HIPPA, etc.

Example Company, Inc. Policy on Privacy

Exchange format

The standard format for exchanging this data in the organization

999999999

Exchange length

Length of the exchange format

9 bytes

Exchange type

The type of the exchange format

Character

Storage format

The format for storing this data element

999999999

Storage length

The length of the storage format

9 bytes

Storage type

The type of the storage format

CHAR or VARCHAR

Storage location

Location of the authoritative source

HR employee record system

Encrypted

Whether this data element should be encrypted

Yes

Encryption algorithm

The encryption method used to encrypt this data element

Blowfish


The information in the metadata repository represents a valuable resource to system developers. Using the metadata repository, they can find out where to get identity data for their applications, who owns it, and what formats should be used. The metadata repository is also the basis for exchanging identity data .

There are numerous tools and methodologies available for creating metadata repositories. Large organizations will probably have the time and money to invest in these tools and train IT personnel in their use, but even small organizations will benefit from creating a simple metadata repository as the canonical source of information about identity data elements.

In creating the repository, it's important that it be accessible and that the rules for creating or updating records not be overly onerous. The goal is to create a tool that system architects and others can use to guide their work. For example, the repository should probably be online rather than stored in a spreadsheet on someone's personal machine.