The first step in creating data architectures is to gather baseline information about identities in your organization. The baseline inventory identifies high-level data sources and documents pertinent information about them. To do that, we'll start with the processes that we identified in the process inventory and find the identity records that are critical to performing those processes. Let's run through an example to see how that might work.
Suppose you've identified "employee provisioning " as one of the processes important to your organization. That process starts when the decision is made to hire an applicant and includes steps such as the following:
Enter applicant data into the HR system.
Identify the hiring manager (e.g., the person who will be the new employee's boss).
Create a 401K account, payroll account, health insurance, and other benefits.
Assign the employee an office.
Order, install, and set up a computer including application software.
Set up an email account and access.
Set up network access.
Order and install a phone.
Update the right directory or directories with the new telephone number, email address, and office location.
Set up a voicemail account.
Establish access controls for all of the enterprise applications that the employee will need to work.
Issue a credit card for travel expenses.
When you look at the steps in this process, the employee is right up front. Consequently, it's easy to identify the employee record in the HR database as one of the identity records in this process, but there are others. Here are some of them:
Employee record in the HR system. The employee has an SSN that serves as a unique identifier. Most large organizations also assign a unique employee ID.
Employee record at the 401K, payroll, and benefit providers. These accounts would all have their own unique identifiers. The employee's SSN can be used to tie them all together depending on your privacy policy and that of your partners.
Record of offices with their location, size, and other properties. Offices might be assigned to certain groups or departments and would need to be associated with their occupants. The identifier is proprietary.
Record of the computer and any installed software. These all have serial numbers. In addition, each network adapter has a MAC address that uniquely identifies it.
Email and network access record in the proper directory or directories. There would be an email address and network identifier assigned as part of this record.
The phone system has records that identify phone lines. The telephone equipment has a serial number, and the phone number itself represents an endpoint on the telephone network. The phone number would need to be mapped to the office where it's installed.
The voicemail account would have an identifier and be tied to the phone number.
Each enterprise application that the employee needs access to (such as the CRM system) would have some way of identifying the employee.
The credit card represents a separate identity document that the company or its payment partner may track. The credit card number is the unique identifier in this case, and your financial services will likely assign others for online access to the account information.
As we consider a single business process, it's amazing to see how it can translate into multiple identity records. Process links these records even if the infrastructure does not.
Going through the process inventory to find identity records creates an initial identity data inventory . The data inventory is a listing of each identity data source and its contents and other important meta-information. The following attributes should be recorded as part of creating the inventory.
This could be something created at the time of the inventory, or it may be a name that the owner or custodian uses to identify the data record.
Your organization might not think of data stores as something that is under version control, but it's a useful idea and one you can start with the baseline inventory.
Define the data record and its purpose; limit to one or two sentences.
What process (or processes) does the data record support?
What fields in the data record are uniquely identifying. There can be more than one identifier. For example, in an HR data record, the SSN and employee number are both identifiers. Many database records have a record number that is unique but has no real identity meaning other than for the record itself. Avoid listing this unless it's meaningful to the business.
These are all of the other attributes, traits, preferences, and characteristics that are included in the record and are used as part of the identity.
Who is the owner of the information?
Who is the custodian of the information?
Any other relevant information about the identity record should be recorded here.
One approach to creating the data inventory is to have business units create inventories for identity data they own and then to aggregate the results. Be careful, however, to train the people performing the inventory so that you get consistent results. You will also have to take care to ensure you don't miss data sources that are jointly owned or owned at the enterprise level. As we've seen, business function and, hence, process do not always fall within neat organizational boundaries. You may find that it's more convenient to create the baseline data inventory in conjunction with the process evaluation that we discussed in the last chapter, instead of doing it as a separate step.