Page 26
Table 2.3 States in Second Normal Form (2NF)
STATE | ABBREV | SPOP |
North Carolina | NC | 5M |
Vermont | VT | 4M |
New York | NY | 17M |
Table 2.4 Cities in Second Normal Form (2NF)
CITY | ABBREV | LPOP | CPOP | PCTINC |
Burlington | NC | 40K | 44K | 10% |
Raleigh | NC | 200K | 222K | 11% |
Burlington | VT | 60K | 67.2K | 12% |
New York City | NY | 14M | 14.7M | 5% |
Albany | NY | 500K | 540K | 8% |
White Plains | NY | 100K | 106K | 6% |
Third Normal Form (3NF): No transitive dependencies. No nonkey column depends on another nonkey column. A table is in 3NF if all of its nonkey columns are dependent on the key, the whole key, and nothing but the key. If, after eliminating repeating groups, every nonkey column is dependent on the key and the whole key, then this is 2NF. And nothing but the key is 3NF. Our city table (Table 2.4) doesn't pass this test because the column PCTINC (percent increase) depends on CPOP (current population) and LPOP (last year's population). In fact, it is a function of the two. This type of column is called a derived column, because it is derived from other, existing columns. However, all of these are nonkey. The immediate solution is to drop PCTINC and calculate it on-the-fly, preferably using a view if it is highly accessed. Also in our state table (Table 2.3), SPOP (state population) depends on ABBREV (abbreviation) because this is a candidate key, although not the primary one. Tables 2.5, 2.6, and 2.7 show our solution, which now gives us three tables in 3NF.
Table 2.5 States in Third Normal Form (3NF)
ABBREV | SPOP |
NC | 5 M |
VT | 4 M |
NY | 17 M |
Page 27
Table 2.6 State Names in Third Normal Form (3NF)
STATE | ABBREV |
North Carolina | NC |
Vermont | VT |
New York | NY |
Table 2.7 Cities in Third Normal Form (3NF)
CITY | ABBREV | LPOP | CPOP | PCTINC |
Burlington | NC | 40K | 44K | 10% |
Raleigh | NC | 200K | 222K | 11% |
Burlington | VT | 60K | 67.2K | 12% |
New York City | NY | 14M | 14.7M | 5% |
Albany | NY | 500K | 540K | 8% |
White Plains | NY | 100K | 106K | 6% |
The Boyce Codd Normal Form (BCNF): No inverse partial dependencies. This is also sometimes referred to, semi-seriously, as 31/2 NF. Neither the primary key, nor any part of it, depends on a nonkey attribute. Because we took the strict definition of nonkey, 3NF took care of our candidate key problem, and our tables are already in BCNF.
Fourth Normal Form and higher. Normalization theory in academia has taken us many levels beyond BCNF. Database analysis and design texts typically go as high as 5NF. 4NF deals with multivalued dependencies (MVDs), while 5NF deals with join dependencies (JDs). Although the theory behind these forms is a little beyond the scope of this book, you should know that a table is in 4NF if every MVD is a FD, and a table is in 5NF if every JD is a consequence of its relation keys.
Normal forms as high as 7 and 8 have been introduced in theses and dissertations. In addition, alternative normal forms such as Domain Key Normal Form (DKNF) have been developed that parallel or otherwise subsume current normalization theory.
Recommendation: strive for at least BCNF, then compensate with physical database design as necessary, which leads us to our next topic. If possible, study 4NF and 5NF and try to reach them in your normalization efforts. Your goal as a DBA is to normalize as high as you can, yet balance that with as few entities as possible. This is a challenge because, generally, the higher the normal form, the more entities produced.
Page 28