Page 19
Page 20
The best thing a DBA can do for his or her database is to start out with a proper, logical design. Unfortunately, database design is often hurried through, done wrong, and even back-engineered after the database has been built. A well-informed and wise DBA knows that a good design improves performance rather than detracts from it, contrary to popular wisdom. Indeed, jumping directly into the physical design or further simply invites trouble, not just in terms of performance, but in data integrity. What good is a database that runs fast and houses bad data? Early on in the design phase of a database system, a proper logical design can tolerate physical design changes later on in the production and maintenance phases. If, however, you shortcut the logical design, not only will you likely have to redesign your logical model, but also restructure your underlaying physical model. The indirect cost (staff hours, downtime, and so on) can be staggering. So let's cover the principles behind logical database design and normalization before you run off and build your database.
As the relational model came to dominate over other data models during the mid-seventies, relational modeling techniques sprung up that permitted formal design capabilities. The most popular of these is the Entity-Relationship Diagram (ERD), developed by P. P. Chen in 1976. This is known as semantic data model because it attempts to capture the semantics, or proper meaning, of business elements, the essence of the business. Because the relational model itself is mostly a syntactic model, one dealing mostly with structure, the ERD typically supplements it. In fact, ERD modeling naturally precedes relational modeling. Once an ERD is complete, it is mapped into the relational model more or less directly, and later the relational model is mapped to its physical model.
An entity is a business element, such as an employee or a project. A relationship is an association between entities, such as employees working on several projects. Attributes are the characteristics that make up an entity, such as an employee's salary or a project's budget. Attributes are said to take on values from domains, or value sets. The values they take will be the data used later on in the relational model. These are all abstractions of a business or part of a business. ERDs can be drawn many ways. It doesn't really matter as long as you choose one and remain consistent in your meaning throughout.
For our purposes, diagrams are drawn using boxes for entities, with the attribute names listed inside the box and the entity name listed outside the box. Arrows are drawn between the boxes to represent the relationship types. There are three kinds of relationships: one-to-one, one-to-many, and many-to-many. A one-to-one relationship uses a single-headed arrow on one or both sides, depending on the kind of one-to-one relationship. A one-to-many uses a double-headed arrow. A many-to-many uses a double-headed arrow on both sides. A pure one-to-one relationship exists when every value of one entity is related to one and only one value of another entity, and vice versa. This type of relationship is rare. Figure 2.1 shows a one-to-one relationship. A husband is married to only one wife, and a wife is married to only one husband. (We aren't counting polygamists.)
Page 21
FIG. 2.1
A one-to-one (1:1)
relationship.
A more common kind of one-to-one relationship is the subtype relationship. This is one of the foundations of OO analysis and design. In OO systems, this is seen as a class and a subclass (or more simply put, a class hierarchy). Figure 2.2 shows how a one-to-one subtype relationship is modeled. This diagram shows the classic example of a square being a subtype of a rectangle. The direction of the arrow indicates the direction of the inheritance, another OO concept related to the class hierarchy. In other words, attributes in the more general entity (the rectangle) donate attributes (such as length and width) to the more specific entity (the square). Hence, the direction of inheritance flows from general to specific.
FIG. 2.2
A one-to-one (1:1)
subtype relationship.
Subtype relationships are more common than pure ones, yet both find infrequent use. As is often the case, when a designer runs across one-to-one relationships, he or she must ask the following questions:
More often than not, one-to-one entities can be combined.
The dominant relationship to be used in the relational model is the one-to-many. Figure 2.3 shows a one-to-many relationship. A state has many cities, but those same cities belong to only one state. (It is true, however, that you will find a city's name reused by different states. This only means that, as a designer, your choice of primary key must not be a city name. For example, it might be state name + city name. The previous section contains a definition of the primary key concept and how to choose one.)
Finally, Figure 2.4 shows our many employees, seen earlier, working on many projects, a many-to-many relationship. Notice that the underlined attributes are identifier attributes, representing our best current guess about what will later be the primary key in the relational model.
Page 22
FIG. 2.3
A one-to-many (1:M)
relationship.
FIG. 2.4
A many-to-many (M:N)
relationship.
Suggestion: At this point, one of the best things you can do for yourself as a designer is to rid yourself of all your many-to-many relationships. Not that you'd actually be getting rid of them, but you can substitute two or more one-to-many relationships in their place. You want to do this because the relational model can't really handle a direct implementation of many-to-many relationships. Think about it. If we have many employees working on many projects, how do we store the foreign keys? (We can't without storing multiple values in one column, thus violating the relational requirement that data be atomic, that no cell may hold more than one piece of information. These two things also lead directly to, and in fact are the same as, First Normal Form, to be discussed shortly.) Hence, to ensure data atomicity, each many-to-many relationship will be replaced by two or more one-to-many relationships.
Hence, what you want to do is to split it so that many employees working on many projects become one employee with many assignments, and one project belongs to many assignments, with assignments being the new entity. Figure 2.5 shows this new relationship. Notice that identifier attributes have been combined.
The new entity called assignment is often called an intersection table in the relational model, because it represents the intersection of every real pairing of the two tables it relates. It is also called a join table. The intersection table is an entity that is not necessarily always a real-life abstraction of some business element, but it is the fundamental way to solve and implement the many-to-many relationships in the relational model.