Today you start with background material on SQL Server and Windows NT. You then look at databases and what makes up a client/server environment. Databases and what they contain are the next subject. You end the day with a look at designing databases. Your exercise for this day is to go through a simulated interview with a client and look at a design for a simple database.
Microsoft has entered the database market with a very strong product in SQL Server for Windows NT. SQL Server can run on either Windows NT 3.51 or 4.0. SQL Server's price and performance records have helped many companies have the power of a Relational Database Management System (RDBMS) for a fraction of the cost of just a few years ago. Microsoft continues to develop and market SQL Server, which should continue this trend for the near future.
Microsoft's SQL Server has sold over two million copies since it was introduced. The current version is 6.5, with the 7.0 version expected in late 1998. Before you learn how to use SQL Server, take a closer look at its history.
TIP: The location of up-to-the-minute news and support for SQL Server is http://www.microsoft.com/sql/. You can even download a 120-day trial version of SQL Server there.
IBM invented a computer language back in the 1970s designed specifically for database queries called SEQUEL; those letters stand for Structured English Query Language. Over time, the language has been added to so that it is not just a language for queries, but can also build databases and manage the database engine's security. IBM released SEQUEL into the public domain, where it became known as SQL. Because of this heritage you can pronounce it "sequel" or spell it out "S-Q-L." There are various versions of SQL used in today's database engines. Microsoft SQL Server uses a version called Transact-SQL, or T-SQL, which stands for Transaction SQL. Although you use Transact-SQL in this book, the emphasis is on installing, maintaining, and connecting to SQL Server. Sams also has a Teach Yourself Transact-SQL in 21 Days, which has more details on the language and its usage.
Microsoft initially coded a SQL server (a database product that understands the SQL language) with Sybase Corporation for OS/2. When Microsoft abandoned OS/2 in favor of its new network operating system, Windows NT Server, it decided to do the SQL server engine for Windows NT by itself. The resulting product was Microsoft SQL Server 4.2, which was updated to 4.21. After Microsoft and Sybase parted ways, Sybase further developed its database engine to run on NT (Sybase version 10 and now Version 11), and Microsoft developed SQL Server 6.0 (now 6.5), which also runs on top of Windows NT.
SQL Server 6.5 is implemented as a service on either Windows NT Workstation or Windows NT Server (which makes it run on Windows NT's server side), while the included utilities, such as Enterprise Manager, operate from the client side of Windows NT Server or Windows NT Workstation (see Figure 1.1).
New Term: A Service is an application that can be started by NT upon boot up; it adds functionality to the server side of NT.
Figure 1.1. The location of SQL Server and utilities on Windows NT Server.
SQL Server uses a type of database called a relational database.
Relational databases are when data is organized into tables within a database. Tables are organized by grouping data about the same subject. The tables are then related back to each other by the database engine when requested.
A database can generally be thought of as a collection of related data. In earlier database products a database was usually just a file--something like employee.dbf, which contained a single table of data. Columns relating to employee data such as salary, hire date, name, Social Security number, and so on, would be inside the employee.dbf file. There would be a row for each person in the company, with corresponding values in the appropriate columns. Indexes, used to speed data access, were held in a separate file, as was any security.
In SQL Server, a database is not necessarily tied to a file--it is more of a logical concept based on a collection of related objects. For example, a database in SQL Server not only contains the raw data, it also contains the database's structure and security, any indexes, and can contain other objects, such as views or stored procedures related to that particular database.
As you saw, a relational database is composed of different types of objects. These objects are all covered in more detail in the particular day that applies to them.
New Term: Some of the more common objects: Tables: These are the objects that contain the datatypes and actual raw data. Tables are the focus of Day 5, "Creating Tables."
Fields or Columns: These are part of the table that holds the data. Columns must be assigned a datatype and unique name.
Datatypes: There are various datatypes to choose from, such as character, numeric, date, or other. A single datatype is assigned to a field inside a table.
Stored Procedure: This is like a macro in that SQL code can be written and stored under a name. By running the name, you actually run the code. One use would be to take the SQL code that runs a weekly report and save it as a stored procedure; from then on, you'd only have to run the stored procedure in order to generate the report.
Triggers: A trigger is a stored procedure that activates when data is added, edited, or deleted from the database. Triggers are used to ensure that business rules are enforced in the database. For example, a trigger could ensure that every book in a bookstore has a valid publisher assigned to it.
Rules: A rule is assigned to a column so that data being entered has to conform to standards you set. For example, rules can be used to make sure that a person's phone number contains only numbers.
Primary Key: While not an object per se, keys are essential to relational databases. Primary keys enforce uniqueness among rows.
Foreign Key: Again, not an object per se, a foreign key is a file or column that references another table's primary key. SQL Server uses primary and foreign keys to relate the data back together from separate tables when a query is performed.
Constraints: A constraint is a server-based, system-implemented, data-integrity enforcement mechanism.
Defaults: A default can be set on a field so that if no data is entered, the default value is used. An example might be to set the area code to where most of your customers come from, thus saving you from entering the area code if they are local.
Views: A view is basically a query stored in the database that can reference one or many tables. It can be created and saved so that it can be easily used in the future. A view usually either excludes certain columns from a table or links two or more tables together. Index: An index is a way of organizing data so that queries run faster. Indexes are covered in detail in Day 13, "Indexing."
The following section on designing relational databases is important for two reasons:
As a SQL administrator, you will likely be given a relational database that has been designed by someone else--that doesn't mean you have to be clueless when it comes to designing a relational database. Knowing some dos and don'ts about designing databases normalization can only help you in your job.
Although the process of designing a good relational database could fill a book by itself, there are some basic steps to consider:
When you take related data and organize it into related tables, you are following rules called normalization rules, which are discussed here.
The design process should start with a good look at the business situation and what the customer is trying to accomplish. The design process starts with brainstorming about different variables, and how those all fit together in tables. The process moves to reports and queries that benefit the users, as well as other pieces of the design, including access to Web pages.
The following list of dos and don'ts will help you during the design process.
Dos of Database Design | Don'ts of Database Design |
Ask the users what they need. | Don't ignore the users. |
Create a list of variables. | Don't create variables you will never use. |
Keep variable names short, yet descriptive. | Don't use the minus sign because it is an illegal character in SQL Server. Avoid the underscore because it tends for long variable names. |
Organize variables into like groups. | Don't have a column so that it contains more than one value. |
Create the same variable in different tables in order to relate them. These become your primary and foreign keys. | Don't create tables with a huge number of columns (over 20). |
Test your design with some sample data. | Don't assume that because your design works well with 5 rows, that it will perform as fast with 50,000 rows. |
Create at least one index for tables that will be queried. | Don't create a lot of indexes per table (over five). |
Design your tables with security in mind. | Don't forget to set security on your data. |
Document variable names, table names, and primary and foreign keys. | Don't lose your documentation. |
The exercise at the end of this chapter goes through a simulated interview with a customer and proceeds to design a relational database for the customer.
A good database design starts with a thorough understanding of the situation and customer's desired outcome. That's why the people who design new systems are called analysts--they analyze the problem in detail and try to think of ways to solve the problem.
Sometimes an old-fashioned interview is the best way to find out exactly what the customer wants, especially if you don't fully understand exactly what the current situation is and what the goal is.
Use questions like these to probe for your customer's needs: What is working for you now?
What parts of the current system would you most like to replace?
Are there additional reports that you would like to be able to generate?
Of what items would you most like to keep track?
Is the data private or public?
Who needs access to the data and what kind of access should each user or group have?
Would you like the data posted on the Internet?
Would you like the public to be able to look up things via the Internet?
Do you have sufficient hardware in place to run both the database server and client software? If money and technology were no object, what would you like to incorporate into the new system? By asking these kinds of questions you can quickly build a sense of why a database is needed. Even though you might not be able to provide everything (given the limitations of the assigned budget, time frame, and hardware allowances), you will have the start of a long-term plan for database growth and expansion.
After the interview (you did take good notes, didn't you?) it is best to brainstorm about possible variables, including their names, types, and lengths. After the variables have been decided, they can be grouped into related tables.
SQL Server supports several different datatypes, including those for characters, values, dates, and money. More detail on datatypes is in Day 5.
Column names should be kept simple yet descriptive.
Lengths should be enough to satisfy all but the most extreme cases. When dealing with names, your limitation may be how many characters can be fit onto a mailing label, not how many to store.
Now that you have decided on the columns, you must organize the data into related tables, which is referred to as normalizing the data.
New Term: Normalization is the process of organizing data into related tables.
By normalizing the data, you are attempting to eliminate redundant data. For example, suppose the same customer buys two cars. His information would have to be entered twice in a single table database. What's worse, if the customer happened to move, you would have to change his address in both places, or your data would not be internally consistent. By entering his information once in a customer table and linking his record to any car purchase, you have not only eliminated redundant (and sometimes conflicting) data, you now have to change his record in only one spot (see Figure 1.2).
Figure 1.2. Organizing variables into a relational database.
There are rules established for the normalization of data. The rules are known as:
Because the table is a database's building block, it becomes apparent that well-designed tables (and thus, fields) are critical to the success of the database.
As with most things, planning and designing the database and tables are the hard part--actually creating them is the easy part. A table is composed of fields and contains related information, while a field is a single column or variable that can be stored. Day 4, "Devices and Databases," covers databases and their creation in more detail, and Day 5 covers tables in greater detail.
Microsoft's SQL Server is a client/server database engine, so understanding the client/server model is necessary. Client/server can be defined as an application that is split into two parts: One part runs on a server, and the other part runs on workstations. The server side of the application provides security, fault-tolerance, performance, concurrency, and reliable backups. The client side provides the user interface and can contain empty reports, queries, and forms. The idea is to have the best of both worlds by taking the advantages of both and pairing them.
SQL Server is the server part of the equation--there are various clients to choose from that can connect to SQL Server; that includes the utilities that come with SQL Server, such as ISQL/w.
Client advantages include:
Server advantages include:
When a query is run in client/server computing, the server searches the database and sends just those rows that match the client. This not only saves bandwidth, but can be faster than having the workstations perform the query, as long as the server is a powerful enough machine.
Although it was a big day, there is a lot of introduction material that really helps when you go to use SQL Server.
Microsoft's SQL Server is capturing more and more market share, and is a client/server-based relational database management system. SQL Server uses Transact-SQL as the SQL language's dialect.
SQL Server developers have the responsibility to design and implement the database, while the SQL Server administrators have the responsibility for the day-to-day management tasks.
A relational database is composed of tables, which contain related data. The process of breaking a database into related tables is called normalization.
Designing a good database starts with understanding the client's requirements for the database. The data can then be grouped into tables.
The purpose of this workshop is to give those who want it some experience designing the database that you use throughout the rest of the book.
Normalizing the database consists of taking related variables and grouping them into tables.
For example, look at this used car dealership database as if it were a spreadsheet. The columns include the following: VIN, Make, Model, Color, Mileage, Photograph, Air Conditioning, Size of Engine, Transmission, Four-Wheel Drive, Power Locks, Other Features, Cost, List Price, Asking Price, Date Bought, Date Sold, Deposit, Deposit Date, Customer Name, Customer Address, Customer Phone, Salesperson Name, Salesperson Address, Salesperson Commission, and Salesperson ID.
You have a total of 25 variables you want to save for each car. Putting them in a spreadsheet would work for a small shop, but a spreadsheet for large shops would quickly become cumbersome.
You need a unique identifier to link these tables to each other. Vehicles have a natural unique ID in their VIN numbers. Customers and salespeople can be assigned a unique number, or an existing identifier (like Social Security or last name and address). In your case you assign an ID to the salespeople and use the customer's Social Security number for an identifier.
You can see that there is related data when looking at vehicles, salespeople, cost and sales data, and customer data. Take the variables and put them with their appropriate table:
Vehicle: VIN, Make, Model, Color, Mileage, Photograph, Air Conditioning, Size of Engine, Transmission, Four-Wheel Drive, Power Locks, Other Features.
Salespeople: Salesperson Name, Salesperson Address, Salesperson Commission, Salesperson ID.
Sales Data: VIN, Cost, List Price, Asking Price, Date Bought, Date Sold, Deposit, Deposit Date, Customer ID, Salesperson ID.
Customer: Customer Name, Customer Address, Customer Phone, Customer ID, Comments.
In order to finalize the design, you need to choose your variable names, by choosing the datatype and length of each variable. Because you may want to alphabetize them later, split names into first and last.
The detailed design of a database is often referred to as the data dictionary, as it defines the variables and tables in the database.
The relations between your tables are shown in Figure 1.3, and your data dictionary is composed as follows:
Vehicle: Table Containing Vehicle Information and Has the Following Variables:
VIN--Primary Key--This is the vehicle identification number. This is a character field with a variable length of 30 (for non-standard cars, and so on).
Make--This is the car's make. Valid responses are something like Ford, Nissan, and the like. This is a character field with a variable length of 30.
Model--This is the car's model. Valid responses are something like Ranger, Altima, and the like. This is a character field with a variable length of 30.
Color--This is the car's color. This is a character field with a variable length of 20.
Mileage--This is the car's mileage. This is a numeric field with a length of 7 (maximum of 9,999,999--you do not track 1/10 of miles).
Figure 1.3. The Vehicles database, tables, and relationships.
Photograph--This is a photograph of the car, taken in 640x480x256 JPG format, so it can be published on the Internet easily. This is an image field.
A_C--This is to indicate whether air conditioning is present. Valid responses are Y or N. This is a character field set to a length of 1.
Size_of_Eng--This is to indicate how many cylinders the engine has. Valid responses are 3, 4, 6, 8, 10, and 12. This is a tinyint (tiny integer--ranges from 0-255) field.
Trans--This is the type of transmission, with valid responses being Standard or Automatic. This is a variable character field with a maximum length of nine.
FWD--This is to indicate whether the vehicle is four-wheel drive. Valid responses are Y or N. This is a character field set to a length of one.
PL--This is to indicate whether the vehicle has power locks. Valid responses are Y or N. This is a character field set to a length of one.
Vehicle_Comments--This field is used to hold comments about the vehicle, which is a text variable.
Salespeople: Table Containing Salespeople Information and Has the Following Variables:
Sales_ID--Primary Key--This is the salesperson's ID. All salespeople are assigned an ID to help track sales, and so on. This is a smallint (small integer--ranges from -32,768 to 32767) field.
Sales_FN--This is the salesperson's first name. This is a variable character field with a maximum of 30 characters.
Sales_LN--This is the salesperson's last name. This is a variable character field with a maximum of 30 characters.
Sales_Ad--This is the salesperson's address. This is a variable character field with a maximum of 30 characters.
Sales_City--This is the salesperson's city. This is a variable character field with a maximum of 30 characters.
Sales_St--This is the salesperson's state. This is a character field with two characters.
Sales_Zip--This is the salesperson's zip code. This is a character field with nine characters.
Sales_Comm--This is the salesperson's base percentage for his or her commission. This can later be used to calculate commissions based on profits on cars sold. This is a tinyint variable.
Sales Data: This Table Tracks Information about the Purchase and Sale of the Vehicle:
Invoice--Primary Key--This is the invoice number of the original purchase of the vehicle. This is a varchar field with a maximum length of 20.
VIN--Foreign Key--This is the same as the field found in the vehicle table, and is used to create a key to that table.
Customer_ID--Foreign Key--This is the same variable that is in the customer table. This field is a key to the customer table.
Sales_ID--Foreign Key--This is the same variable that is in the sales table. This is a key to the sales table.
Cost--This is the vehicle's actual cost. This is a smallmoney field (it can handle up to $214,000).
List_Price--This is the list (blue book) price for the car. This is a smallmoney field.
Ask_Price--This is the asking price. This may change due to a sale, advertisement, or incentive, and so on. This is a smallmoney field.
Date_Bought--This is the date that the vehicle was purchased by the dealer. This is a smalldatetime field, which keeps track of dates down to one minute intervals.
Date_Sold--This is when the vehicle was sold to a customer. This is a smalldatetime field.
Deposit--This is the amount of a deposit (if any) that the customer has put down on a vehicle. This is a smallmoney field.
Deposit Date--This is when the deposit was made. This is a smalldatetime field.
Customer: This Table Keeps Track of Customers:
Customer_FN--This is the customer's first name. This is a variable character field with a maximum of 30 characters.
Customer_LN--This is the customer's last name. This is a variable character field with a maximum of 30 characters.
Customer_Add--This is the customer's address. This is a variable character field with a maximum of 30 characters.
Customer_City--This is the customer's city. This is a variable character field with a maximum of 30 characters.
Customer_State--This is the customer's state. This is a character field with two characters.
Customer_Zip--This is the customer's zip code. This is a character field with nine characters.
Customer_Phone--This is the phone number of the customer. This is a 10-digit character field.
Customer_ID--Primary Key--This holds the Social Security number of the customer, which doubles as the ID number for the customer as well. This is a nine-digit character field.
Customer_Comments--This is a text field where comments about the customer can be added.
© Copyright, Macmillan Computer Publishing. All rights reserved.