Chances are good that when you started learning about development in the programming world, you began with flow-charting. Flow-charts are great tools for diagramming programmatic flow and for laying out the components of your system. The purpose is to discover logic flaws and missing functionality before coding, rather than during the cycle.
The reality is that flow-charts are rare. How many times have you been in a crunch to pull a flow-chart together for a project, but waited to do the chart until the project was completed? This situation is more common than any of us like to admit. Pulling a flow-chart together after the fact is easy, because you already know what you've designed in the program, and you're not going to flow-chart something that you opted not to include. This approach is fairly safe, provided that you reviewed the program along the way to ensure that its functions meet your customer's needs. Database design, however, isn't as open to modification along the way. Because changing a database table after the fact alters the foundation of all other parts of the system, even in subtle ways, it often requires major overhaul work in the balance of the system. A simple change in a table can mean that entire portions of the application stop functioning.
Probably the most important thing that you can do when you bring up a database-oriented system is diagram and design the database structure that will support the application. If you don't do this, you end up making some subtle (or not-so-subtle) changes in the structure later--probably at a time when the changes are far more expensive in the development cycle.
When you decide to develop a system based on Microsoft SQL Server, you've taken the first step toward implementing two separate architectures in your software: client/server and relational database tables. Both architectures are powerful if they are used correctly, and they present a real advantage to your system if you pay attention to the rules along the way.
This chapter introduces those rules, including the theory and practice of a client/server system, such as determining the best location to implement a given function and breaking apart procedural flow.
Because the decision to move to client/server depends heavily on the flow of the application and where things are physically being completed, SQL Server is an ideal component to provide server-side functionality and to bring client/server to your applications. You need to understand what pieces of the database design dovetail with your desire to move to the client/server world. n
When you start working with relational databases, you inevitably hear about data normalization and bringing things into third normal form. Normalization refers to how you implement the relationships and storage of data in your database tables. When you normalize a table, you try to limit the redundant data in the table. Many levels, or types, of normalization exist.
Your overall goal is likely to be a 3NF, or third normal form, database. In most cases, this level of normalization is the best compromise between extremes when it comes to normalization versus functionality and ease of implementation. Levels exist beyond 3NF, but in practice, they can cloud the database with more design issues than functional issues.
When you delve into the world of normalized databases, you are, by definition, starting down the road of relational databases. Before normalized databases, structures used a series of pointers to retain relationships between different tables and values. You may recall implementing linked lists, in which each row of a database table contains a pointer to both the following and preceding rows. To traverse the database, you simply walked up and down this list of links between records.
In the relational world, you define columns that relate to each other between tables. These columns are keys to other values. Keys are used to uniquely define a relationship to another instance or set of information. This chapter discusses keys in the definitions of the normalization levels.
What's in a Name?
A key difference between SQL Server-type database implementations and other, more traditional PC-based databases is the terminology that describes the databases and their information. Keep in mind that a device, or physical file on the disk drive of the server, contains one or more databases. Databases contain one or more tables, and tables contain one or more columns of information. For each table of columns, one or more rows may exist.In more traditional terms, there was no concept of a database, as in SQL Server. Instead, you had a file that contained records of fields. The following table shows a basic comparison of terms between a SQL Server implementation and a more traditional database, such as Btrieve or dBASE:
New Term Old Term Device N/A Database File Table N/A Column Field Row Record
You need to keep these terms in mind. If you use the new terms in describing the tables that you're designing, you can prevent problems of ambiguity between developers and designers.
With relational databases, you don't use ordered, or sorted, rows. You use real-time statements--those that are evaluated when they are called or issued--to control the presentation of the information. You also use joins and views to control the way that information is retrieved, rather than try to store the information in the most advantageous format possible at the outset. This method allows for more dynamic access to the information in the database tables; it also allows you to simply store the information and then retrieve it in any manner.
The next few sections examine the different types of normalization, from the first through the third normal form.
In first normal form, or 1NF, the foundation for the relational system is put into place. In 1NF, you don't have multiple values represented in any single column. In database terms, each value in the database table is atomic, or represented only one time.
In the past, you may have implemented a database in which you stored the item code for each item ordered with the order record, as in a point-of-sale system. Later, when your program queried the order, it retrieved and parsed this field, and could determine what was ordered with that order record. Figure 4.1 shows an example of this type of table. You have the opportunity to store one or more item numbers with the order record.
FIG. 4.1
Without 1NF, you could store more than one logical item in a physical record.
This approach isn't valid when you normalize your database.
NOTE: These examples were created in Microsoft Access working with SQL Server, because it's a good tool for creating tables quickly and easily. The tool that you use is entirely up to you. The SQL Enterprise Manager, although a bit less visual, can still serve as an excellent design and testing tool.
In 1NF, duplicates aren't allowed. You need to create a schema in which only one item is recorded for each order record on file. To implement the point-of-sale solution mentioned earlier in this section, you would have an order represented by one to n records containing the item-code information that made up the order. This solution provides a slightly different challenge in retrieving the information for the record. You must have a means of retrieving each record associated with the order and making sure that you've retrieved only those records--no more or less. This leads to order numbers called out in each record, of course.
Figure 4.2 shows the result of this first pass at normalizing a database table.
FIG. 4.2
In a 1NF table, each row is a single atomic record that must be capable of standing
alone.
For now, simply remember that in 1NF, each row must contain only one instance of the information and all column values must be atomic.
The first requirement of second normal form, or 2NF, is that it fulfill the requirements of 1NF. The second requirement is that each instance or row in the database table be uniquely identifiable. To do this, you must often add a unique ID to each row. In the example in the preceding section that breaks apart the orders table, it looks at first as though the structure fits this rule. You have, after all, instituted an order ID, and if you combine the order ID and the item code, you have a unique handle in the row. Right?
Not exactly. You could conceivably have a single order with more than one instance of an item. Consider an example: When you do your grocery shopping and buy your milk for the week, you may buy multiple half-gallons of milk in the same order.
You need to add an OrderItemID column to the table to fulfill the requirements of 2NF (see Figure 4.3). You keep the OrderNum, but OrderItemID is the primary unique identifier for the item in a given row.
FIG. 4.3
After you add a unique row ID for each line item, this table fits the 2NF model.
NOTE: Notice that although a new item has been added as item 11, it can still be related to the order based on the order number. What has been done is to provide a way to uniquely identify the row within the table.
Third normal form, or 3NF, is a life-saver for the developer. All the work of normalizing your database tables really pays off when you move to the 3NF model. As 2NF relies on first being in 1NF, 3NF requires that you be compliant with the 2NF model. In short, when you have a table that's in 3NF, you won't have redundant nonkey information in your table that relies on nonkey information in another table.
That definition seems to be strange until you understand what's really happening. The goal of normalizing your tables is to remove redundant nonkey information from your tables. In Figure 4.3 earlier in this chapter, the model is broken nicely. Because you're storing descriptions in the table, and because these descriptions probably are used elsewhere in the database, storing them individually in the table is a problem.
Here's where you start to see the real advantages of moving to 3NF. The simple examples in this chapter relate to a grocery store.
Imagine that you're storing your information as shown in Figure 4.3 and need to change the description of Milk to Milk 3%, because your vendor has maliciously introduced Milk 1% into the mix. You need to write a utility that updates the sales database tables and any other table that ever referenced Milk to show the correct percentage. A real nightmare that often leads to problems is inconsistent information in the database. Forgetting about, or not knowing about, only one table that needs to be updated can invalidate your use of an item for the sales history.
Normalization is the key to taking care of this problem. In Figure 4.4, the problem is corrected simply by removing the description from the table.
FIG. 4.4
To fit the 3NF model, the description column is removed from the order items table.
Because an item code is already assigned to Milk, you have the information that you need to set up the Inventory table shown in Figure 4.5.
FIG. 4.5
By creating an Inventory table, you create a "home base" to refer to and
use any time you reference inventory items.
After you begin normalizing your table, you begin to wonder how to get a complete picture of the item sold. How can you find out all the information about the line item--its description, the sale price, and so on? This is where relational databases and their use of views come into play. In the example, you can create a query that returns the information that you need quickly and easily. Figure 4.6 shows what a sample view returns to your application for the table examples.
FIG. 4.6
When you create a relational view of the database tables, you can retrieve and work
with a complete picture of the information, although the information may be dispersed
across several tables.
You can see that by using the combined information between the tables, you still have the same full data set to work with, but you're limiting the sources of information. Often, the way that you end up working with the information does not change--only the methods used behind the scenes to retrieve that information. This is the case with the 3NF table in this example. You're still storing the same information base. You're just retrieving the information by using a different method--in this case, a logical view of the two related tables.
Before you start designing your database, you need to understand where the functional components of your system will reside. It's important to understand where data manipulation is done and what should be stored in the database compared with what should be calculated or determined on-the-fly.
With SQL Server, you can implement true client/server systems. These systems can adhere to the concepts of client/server, allowing you to divide functional components into cooperative operations that accomplish your application's goal. This sounds strange, but what it amounts to is dividing processing between the client and server in a way that makes sense to the application. In database-oriented systems--especially those in which the database subsystem is open and accessible from many points--implementing an intelligent database layer to manage the data makes sense. This layer is responsible only for storage and inquiries as they relate to the information; it has no responsibility for the presentation of information.
The next couple of sections review the types of functions and operations that reside in the client and server sides of the client/server model. Although these concepts aren't exhaustive, you need to understand them.
For many people, client/server is just a fancy term for a PC database that resides in a common location and is accessed by many workstations. After reading this chapter, you should understand that client/server is much more than a common storage location. You can't create a client/server system by using Microsoft Access database tables, for example, whether the tables are stored on a file server or a local system, because no intelligent engine can process the database independently of your application. The logic controlling the data is still driven by your client-side application.
NOTE: You can create Access-based client/server systems by creating linked or attached tables to an Access database. These tables can be based in a server-based intelligent database system and can help you create a client/server system in Access. The statement that Access database tables aren't client/server refers only to the native Access database tables, which typically are contained in physical files that have the .MDB extension.
Client-side applications are responsible for displaying information to users, manipulating information in the database and on the user's screen, displaying reports, and providing for user-interruptible operations. Any operation that you submit to the server component of your system should never require users to intervene to complete the operation.
The client application typically is written in a host language, such as Delphi, PowerBuilder, Visual Basic, C, and C++. These applications allow users to perform, add, change, and delete operations against the database, when applicable.
The client application should avoid, at nearly all costs, having to work with the entire contents of the database tables. When a set of information is worked with by the client application, you should always think of it as being a results set, not the entire data set that is available to you. A results set indicates that you should ask the server application to filter and limit the information that will be presented to you, so that the operations you carry out are completed against as small a set of information as possible.
One of the best comparisons of older systems and client/server compares a file cabinet and a folder. In older systems, you typically do the equivalent of asking for a file cabinet full of information, so you can take the time to sift through the contents to find the file that you want. In this scenario, your client-side application is the piece that does the sifting. All information from the database table is passed through the client, with does the filtering to find the information that you want to work with.
In the client/server world, however, you simply request the file folder that you want. You don't filter through the file cabinet; the server process does. This arrangement limits network traffic, because only the results set is passed back over the network. Another significant benefit of this arrangement is that it also increases performance for your application. Typically, server systems are powerful computing platforms. Because this optimized server platform can work with all information locally, it can do so at top speed, processing the information at the best rate possible.
In short, your client-side application should be optimized to work with the results sets. This fact works hand-in-hand with database structure and design, because you need to make sure that you create the database in such a way that it supports this requirement. You have to define in the contents of the database the joins, queries, stored procedures, and table structures that support this optimized query.
In summary, following are some guidelines for the client side of your application:
The server side of the client/server equation typically is task-oriented, which means that operations are broken into logical components. This process is what you're starting to see now with Microsoft's BackOffice offerings. You have server-side components that control mainframe connectivity with the SNA (Systems Network Architecture) Server, database access with SQL Server, electronic mail with the Exchange Server, Internet and intranet access with Internet Information Server, and more products on the horizon that continue in this vein.
With SQL Server, your goal is to create the results sets that the client-side applications require. The database engine is responsible for carrying out information storage, updating, and retrieval in the system. When you start working with SQL Server, you'll notice that it has no user interface (UI) at all. Utilities are available to help you manage it, but SQL Server in and of itself has no UI. This situation occurs by design. SQL Server exists to fulfill requests made of it to the point of returning the results from those requests. Unlike Access, dBASE, FoxPro, and similar products, SQL Server has no involvement in showing users the results of these queries.
When you design your database structures, you need to keep close watch on the way that you implement informational control in your system. Different people may need different access levels to the information, for example, so security should be a major issue in querying the database. In such a case, your table structures and joins need to reflect this requirement.
Also keep in mind the way that users are going to access your information. In today's open systems, new challenges exist in presenting and controlling information. As you create your database tables and the rules that govern them, you need to assume absolutely nothing about the client side of the application. A good question to ask yourself is, "When I receive this information, what needs to happen with it?" You can answer this question by saying, "It needs to be stored for later retrieval."
If you're storing sales information, you should validate the item code of the item that is being sold. Does it exist in the inventory database? Is sufficient stock on hand to sell this item? Do you force sufficient stock levels, or do you allow a "negative stock" situation to occur and simply log the discrepancy in a suspected transactions table?
Each issue requires work on the database side through rules and triggers that allow you to define the functions carried out automatically by the server, based on database values. True, you could expect the client application to complete these tasks, but what if someone is accessing your database from Excel or Word? Can you really assume that the user has made sure that these checks are taking place? The server should carry out these important tasks to make sure that they are done, regardless of the point of entry.
NOTE: For additional information about rules and triggers, see Chapter 15, "Creating and Managing Triggers." In Chapter 13, "Managing and Using Rules, Constraints, and Defaults," you can find out more about server-side automation. In addition, see Chapter 25, "Accessing SQL Server Databases Through Front-End Products."
There are exceptions to every rule, and the client/server model certainly is subject to this fact. These may be times where you need to do more processing on the client, or cases on the server where you want to blindly store information received. You need to address these situations on a case-by-case basis, but keep in mind that the client/server model is there to help and guide your efforts. Always be cautious when you develop systems that fall outside the model, because more often than not, you're asking for trouble.
CAUTION: While you're implementing an intricate trigger or rule, it may seem to be the right thing to do, but it can become a nightmare if you try to move too much functionality to the wrong side of the client/server model.
Think long and hard about other ways that you can implement something if you put into place an operation that breaks the client/server model. You'll be glad that you did.
At the start of this chapter, it was stated that one way you can design your database is diagram it and work out the relationships between tables on paper first. This practice helps point out any flaws in the points of information that you may need to extract from the system.
Database flow-charts consist of entity relationship diagrams, or ERDs. ERDs show exactly how a database is structured, what the relationships between the tables are, what rules and triggers are involved in maintaining referential integrity, and so on. One benefit of the ERD is that you can sit down with the client and take a logical walk through the database, making sure that the system serves the client's needs.
NOTE: It's beyond the scope of this book to provide an all-encompassing view of the intricacies of entity relationship diagramming. The information provided in this chapter is meant to fit 90 percent of the cases for what you'll be doing. In some cases, you need to implement slightly different or less frequently used facets of the ERD systems. In those cases, you'll be best served by consulting the capabilities of your design software and database back end, as well as the resources available on the Internet and in other sources of information on the world of ERD.
Entity relationships are shown by drawings that include several objects. These objects include entities, attributes, and relationships. You can depict each aspect of your system in a specific way. Figure 4.7 shows a basic diagram for the point-of-sale system discussed in this chapter.
The customer table was added to track an order for a customer, but apart from this addition, the table structure reflects the earlier tables and relationships. Look at the way that these basic objects--the entities, attributes, and relationships--apply to the simple model.
FIG. 4.7
This diagram shows a relational diagram for the grocery sales system.
Using and Referencing Entities Figure 4.8 shows four boxes, each of which represents a table, or entity. These entities will become the tables in the database, and each box includes the columns that will be created for the table. Each entity's object has two sections, with a portion above the dividing line and a portion below it.
TIP: When you name entities, you should always make their names singular. This practice helps reinforce the fact that an entity contains only one instance of the object it represents.
The portion above the dividing line represents the identifying portion of the row. Remember that to have a normalized database in 3NF, you need to be able to uniquely identify each row instance in the database table. By placing the identifying characteristics above the line, it's easy to read and determine how the record will be retrieved in most cases.
In Figure 4.8, you can see that by the definition for the customer table, you'll most likely retrieve records from it by using the CustomerID.
NOTE: Although a record may usually be retrieved by this identifier, it's not an exclusive handle to the row. In most systems, you need to provide other means of retrieving rows. In the Customer table, for example, you probably will need to implement some name searches.
These searches won't include the Customer ID, but after you find the customer that the user wants to work with, you are likely to retrieve the Customer ID for the selection and then retrieve the entire customer record that was selected.
FIG. 4.8
The basic entity is represented by a box that typically contains two sections.
Using and Referencing Attributes Attributes go hand in hand with the entity objects.
Attribute is the term for the different column elements that make up the entity object--the table in the database. Attributes for the Customer table include CustomerID, Name, Company, and so on.
Attributes are described as key or non-key. As the name implies, non-key attributes are those items that make up the entity that don't depend on any other entity. In other words, non-key attributes don't make up, or constitute part of, a key that's used in the entity.
Key attributes come in two types: primary and nonprimary. Primary keys are always shown above the line, indicating that they're identifying attributes for this entity. If the attribute is a key to the entity but not part of the identifying structure for the entity, it's placed below the line.
If an item refers to a key value in another table, it's known as a foreign key. Again, if you reference the basic model, as shown in Figure 4.9, you can see that the Customer table doesn't have any foreign-key segments, but the Order table does, as indicated by (FK). The foreign keys in the Order table are nonidentifying, but they help designate the customer to whom the order refers.
Moving from the Order table to the Line Item table, you see that the OrderNumber is listed as an identifying component of the Line Item table. To find a specific instance of an order line item, then, you need to know the OrderItemID, the ItemCode, and the OrderNumber. In this implementation, the Line Item table is an associative table between the Inventory table and the Order table.
FIG. 4.9
The basic ERD shows foreign keys as primary (identifying) and non-primary (nonidentifying)
columns in the sample tables.
Using and Referencing Relationships If a "proof-is-in-the-pudding" segment of database design ever existed, it's in the relationships that you define between the different entities. It's easiest and most descriptive to look to your database ERDs to tell a story, in plain English, about what's happening in the database. In Figure 4.9, you see verbs between the entities. These verbs describe the relationships between the two entities and are indicated by the relationship line between them. A customer makes orders in the Order table, and an order includes line items that reference inventory items. You can also read the diagram in the other direction. You could also say that inventory items are referenced by line items. In any event, you should be able to show concise paths for information to follow when you are trying to reach an end result.
The examples use ERwin by Logic Works. This tool allows you to define the different objects and then place the relationships between the objects appropriately. In the relationship between the Customer and Order tables, Figure 4.10 shows that the relationship is nonidentifying and that the relationship is a zero, one or more relationship.
You can also see that the key in the Order table that will be used to retrieve the customer information is CustomerID. This key is automatically added to the Order table and to the non-key portion of the record. If you define an identifying relationship, CustomerID is moved to the top portion, or identifying-key portion, of the Order table entity.
FIG. 4.10
ERwin allows you to define the relationship between the Customer and Order tables
easily.
Each of the other relationships is defined in a similar manner. You can walk down the table structure and determine exactly how the different entities will interact. The following sections guide you through the design process with the customer.
Creating a solid definition for a database structure involves several steps. In many cases, you can point to flaws in the database design, only to realize that if the customer were more completely involved in the process, the problem could have been averted.
While you are bringing a customer up to speed, terminology, methodology, and approach aren't the goal of client reviews. The goal of any system that you'll endeavor to write is to ensure that the database structure supports the functionality of the system. This is where the maxim "Determine the output first; then input will follow" comes to bear on a project. It's certainly true that the test of any system is the output. If you created the best system ever devised to allow input of information but didn't provide a way to get meaningful information out of the system, you wasted your time.
The first thing that you need to do is decide what you'll be providing for the customer. Reports and output nearly always are a developer's least favorite part of a system to develop and implement. Often, one of the first statements that a developer makes to a customer is, "We'll give you reporting tools to create your own reports. All we need to do now is figure out what needs to happen in the program." This statement is a formula for problems.
If you don't know that the users need to have an aging report from their point-of-sale system's accounts-receivable subsystem, for example, would you automatically store the date on which the original invoice went out?
NOTE: Involving the user may seem to prolong the development process. But many studies and personal experiences have proved otherwise. Spending the time now pays off many times over later in the project, in terms of more accurate deadlines, correct designs, and the like.
Truly, the only way you can ensure that you're not coding in vain is to make sure that you fulfill the output needs for the system.
Following is a general set of guidelines that not only allows you to ensure that you've hit at least the high points of your target audience, but also relates nicely to the database-design topics covered in this chapter:
These questions are the test of your database design. You should be able to address each of the stated intentions for the system.
NOTE: Systems get very complex very fast. The importance of reviewing systems with users can't be overemphasized. In one such case, working with the users prevented a substantial design rewrite. Although the information to determine a specific-case customer was available, the database tables hadn't provided for the relationships correctly. Preventing this oversight early on saves many, many hours of development time in the long run.
In some cases, you face a join situation that won't resolve to a single instance on either side of the database-table equation. In such a case, consider implementing an associative table that provides a link between the tables. You can see a simple example of this technique in the sample system, because you could just add ItemCode to the Order table.
NOTE: A join is a way of creating a logical view of your data. You specify how the information is retrieved, and what information you want to see. Then SQL Server returns the results set to your client application in the form of a set of data from the tables as you've requested.
If you didn't have the associative table between the Order Table and the Inventory table, you would end up with a many-to-many relationship between the two tables. This isn't a good way to accomplish the relationship, because you wouldn't have a singular path for identifying an instance of an order record. Figure 4.11 shows the associative table, the Line Item table.
FIG. 4.11
An associative table has been implemented to remove the many-to-many relationship
problem imposed by the Order and Inventory tables.
Sometimes, fully normalizing the database tables just doesn't work with the model that you're putting into place. You're most likely to encounter this situation in terms of performance. You may end up with a join to return information that simply takes too long to complete.
If you're working with static tables in your application (as may be the case with the inventory table in the example), you may want to load that table to a local Access table for access by your application. If you were to do this once a day, you'd be reasonably assured of having the correct, up-to-date information at the workstations.
You may be loading down tables that, when considered alone, don't provide a complete picture. If you have a customer table, an account-balance table, and a sales-representative table related by means of customer number, you end up with a three-way join to return a complete picture of the information.
You may want to consider denormalizing this set of tables at the client. In this case, you could create a single table that holds the information from all three tables as a single row of information. Thereafter, when you request information for customer X, you receive all the information that you need, and a further join isn't required.
NOTE: In reality, you wouldn't want to manipulate something as potentially dynamic as the customer-account balance in a remote mode. As a rule, you should use this method of denormalization only for static tables, or at least for those that change infrequently.
In this chapter, you reviewed a great deal of information that can pay off in the long run for your applications. Be sure to normalize your database tables to the highest degree possible. If you find exceptions to the normalization goals, you should make sure that you're not overlooking some other method of accomplishing the same task.
Review your plan carefully with your customers, whether the customers are internal or external clients. This review provides substantial leverage in your projects and helps you come in on time and on budget.
The following chapters provide additional information that you'll find useful in your database-design efforts:
© Copyright, Macmillan Computer Publishing. All rights reserved.