As you review, consolidate, and create identity data, there are several important principles that you should keep in mind. Some of these are adapted from The Practical Guide to Enterprise Architecture by James McGovern, et al. (Prentice Hall).
Wherever possible, you should avoid replicating identity data and ask for it from its canonical source instead. For example, if you need the SSN for an application, it would be better to retrieve it using a SAML request or database query from the HR employee record.
If you can't avoid replicating identity information, you should do so only because the business requirements force the replication. Application developer convenience is never a good reason for replication. One of the primary reasons for this principle is that identity data is often subject to internal and external audit, security, and privacy requirements that will have to be monitored and paid for by the business unit. Replicating identity data increases the cost of these external requirements and so that cost ought to be traded off against business requirements.
Break this principle at your own peril. As soon as multiple systems can change replicated data, the data will be guaranteed to be out of sync.
Applications should be constructed so that they don't rely on the identity data being in a particular data source or on a particular machine. This will increase flexibility in future application changes. Using SAML, for example, to request properties of a particular identity over the network, rather than reading them using an SQL query over JDBC makes it easy to change out the database for another application without breaking the system.
The more you can build consistency into the infrastructure, the more likely it is that the data will remain consistent. We'll discuss the role of policy in this principle in the next chapter.
Whenever identity data will be collected and used in multiple locations, each system should perform validity checks on the identity data as part of its processing. These checks should ensure that the identities conform to the relevant schema and any business rules documented in the inventory or metadata repository.
As we've discussed, open standards have a number of advantages over proprietary standards.
Sensitive identity data should never be stored unencrypted. Usually, encrypting the entire record is too disruptive, because it has to be unencrypted each time any part of it is used. A better strategy is to encrypt just the data elements that are sensitive. As we saw in Chapter 6, the XML Encryption standard can be used to encrypt only certain elements in an XML record.