Today you learn how to add tables to your database. In this chapter, you are first presented with an overview of tables. You then look at the different datatypes that are used to create your tables and learn how to put all of the information together to create and modify tables in your database. Finally, you learn how to estimate the size of your database. As with other chapters, you are presented with some real-world questions and answers, as well as a quiz and exercises.
Tables store all of the data in your database and are organized into rows and columns (records and fields). Each column in your table can store a specific type of information, or datatype.
A single table represents an entity in your database. Each row in the table represents an occurrence of that entity. The columns in your table describe the attributes of that entity. For example, you might create a table of employees. A single row in your Employees table would represent a single employee. The fields that make up that table would describe the employee. Some fields that you might include are EmployeeID, FirstName, LastName, SSN (Social Security number), and so on.
Now take a look at columns.
Each column in your table will need to be assigned a name, a datatype, a length, and nullability. Columns can be placed in any order in your table. Each column must also have a unique name within the table and follow the rules for identifiers.
The maximum length for a single column is 255 characters and you can have up to 250 columns in a row; however, a row may not exceed a data page (2,048 bytes) for data and overhead. You learn more about column lengths, as well as table and row overhead, later in this chapter.
Column names must be unique in the table and should follow the rules for SQL Server identifiers:
NOTE: While the use of symbols and embedded spaces in identifiers is allowed in SQL Server, most developers avoid their use.
Table 5.1 shows some examples of valid and invalid identifiers.
Identifier | Comments |
1001ArabianNights | Invalid identifier. Identifiers must begin with a letter. |
tbl100WaysToCookChickenProperly | Invalid identifier. Identifiers must be 30 characters or shorter. |
@Addresses | This is only a valid identifier for creating a variable. |
Tbl@Address | Invalid identifier. The @ symbol can only be the first character in an identifier. |
#tblCities | This identifier is only valid if you are creating a temporary object that is available to the user who created it. |
##tblCities | This identifier is only valid if you are creating a temporary object that is available to all users. |
tblEmployee | Valid identifier. While prefixing identifiers (tbl for table) is not necessary, it does make your SQL statements easier to read, as the prefix denotes what the identifier represents. Other common identifiers are qry for query, tr for trigger, sp for stored procedures, and so on. |
Titles | This is a valid identifier. |
Author_Review | This is a valid identifier. |
AuthorReview | This is a valid identifier. |
The datatype is used to specify information type (numeric, character, and so on) and how much space that information takes up in a particular column. You must choose your datatypes carefully, as they cannot be changed once a table has been created. To alter a column datatype you must drop the table and recreate it. Some datatypes have a variable length, while others have a specific length. SQL Server datatypes can be broken down into the following groups: string, binary, integer, approximate and exact numeric, special, date and time, money, auto-incrementing datatypes, synonyms, and user-defined. Table 5.2 lists the different datatypes allowed in SQL Server 6.5.
Category | Datatype | Comments |
String | char(n), varchar | Stores character strings. |
Binary | binary(n), varbinary | Stores binary information in two-byte pairs. |
Integer | int, smallint, tinyint | Stores integer values. |
Approximate Numeric | float, real | Stores approximate numeric information. |
Exact Numeric | decimal, numeric | Stores exact numeric information. |
Special | bit, text, image | Stores a single bit, character information greater than 255 bytes, or image data. |
Date and Time | datetime, smalldatetime | Stores dates and times. |
Money | money, smallmoney | Stores currency values. |
Auto-incrementing Datatypes | identity, timestamp | Stores values that are automatically incremented or set by the SQL Server. |
Synonyms | shown later | Maps ANSI datatypes to SQL Server datatypes. |
User-defined | shown later | You can create your own datatypes to store information. |
The nullability of a column refers to whether or not an entry is required for that column. If you would like to allow a column to be empty, then specify NULL. If you would like to insist that each row have an entry in that column, specify NOT NULL. If you do not specify NULL or NOT NULL, the system default will be used. When you first install SQL Server, the system default will be NOT NULL. As with most options things in SQL Server, this default can be modified.
You will now be presented with an in-depth discussion of each of the datatypes allowed in SQL Server.
Strings contain character data that is comprised of letters, numbers, and symbols. You can store character data in either a fixed-length or a variable-length format using the char(n) or varchar(n) keywords. You can store a maximum of 255 characters in these datatypes.
When you create a fixed-length field, you are specifying that this field will always contain n bytes of information. If the data that you entered in the field is less than n, it will be padded with spaces so that it always takes up n bytes. If you try to put more than n bytes of data in the field, it will be truncated. Table 5.3 shows you some examples of entering data into a field declared as Fname char(*). (The symbol * denotes a space in this example.)
Data Entered | Fname Contains |
Lawrence | Lawrence |
Mark Anthony | Mark Ant |
Denise | Denise** |
When you use variable-length fields, you specify the maximum length the field can be. Unlike fixed-length fields, variable-length fields are not padded with spaces. This may make your database more efficient from a memory standpoint, but you will pay a price in performance. When a field has been declared as variable-length, SQL Server will have to determine where that field stops and the next field begins. There is additional overhead associated with variable-length fields in the form of bytes added to the rows and the table. This overhead will be discussed in more detail later in this chapter. varchar is useful when you are expecting a wide variation in data size, or you are going to allow NULL values in your field. When you declare a field with char(n) and do not specify NOT NULL, the field will be converted to a varchar(n) datatype.
When you enter character data into SQL Server, you must enclose the data in single quotation marks. To enter NULL into a field in SQL Server, use the NULL keyword without quotation marks.
Here is an example of creating a table using the char and varchar keywords:
CREATE TABLE tblCustomers ( CustID char(8) NOT NULL, CustName varchar(30) NOT NULL, Email varchar(50), )
The binary datatype is used to store binary data. Binary data is comprised of the numbers 0 to 9 and the characters A to F stored in groups of two characters. For example, if you create a field like SomeData binary(20), you are specifying that you are going to have 20, 2-byte pairs. Although these binary datatypes look like hexadecimal values, they are not. They represent bit patterns.
As with the string datatypes, you can specify a maximum of 255, 2-byte pairs for both the binary(n) and varbinary(n) datatypes. If you use the binary(n) datatype, information that you enter will be padded with spaces. The varbinary(n) datatype does not pad. If you attempt to enter data that is longer than the specified maximum length, the data will be truncated.
To enter data into a binary datatype, you must precede the string with 0x. For example, to enter the value 10 into a Binary field, you would prefix it like this: 0x10.
As with char and varchar, if you do not specify NOT NULL, a binary column will be created as varbinary.
Here are some sample statements:
MyIcons varbinary(255) MyCursors binary(200) TinyWav varbinary(255)
There are three types of integer datatypes (int, smallint, and tinyint) which store exact, scalar values. The difference between the integer datatypes is the amount of storage space they require and the range of values they can store. Table 5.4 displays the integer datatypes and their ranges.
Datatype | Length | Range |
tinyint | 1 | 0-255 |
smallint | 2 | ± 32,767 |
int | 4 | ± 2,147,483,647 |
Here are some sample declarations and the data they could store:
EmployeeAge tinyint NULL EmployeeID smallint NOT NULL CustomerID int NOT NULL
SQL Server allows both approximate datatypes float and real as well as exact numeric datatypes decimal and numeric. When you declare approximate datatypes, you specify a precision which is the maximum number of digits allowed on both sides of the decimal point. When you declare an exact datatype, you must also specify a scale which is the total number of digits allowed on the right side of the decimal point.
Table 5.5 lists the precision values allowed for both approximate and exact numeric datatypes and the number of storage bytes required.
Precision | Storage | Precision | Storage |
1 - 2 | 2 | 20 - 21 | 10 |
3 - 4 | 3 | 22 - 24 | 11 |
5 - 7 | 4 | 25 - 26 | 12 |
8 - 9 | 5 | 27 - 28 | 13 |
10 - 12 | 6 | 29 - 31 | 14 |
13 - 14 | 7 | 32 - 33 | 15 |
15 - 16 | 8 | 34 - 36 | 16 |
17 - 19 | 9 | 37 - 38 | 17 |
NOTE: SQL Server supports a default maximum precision of 28 digits. This is to ensure compatibility with the limits of front-end development tools like Microsoft Visual Basic and PowerBuilder. It is possible to modify this value.
The approximate numeric datatypes are float(n) and real. The numbers stored in these datatypes are comprised of two parts: the mantissa and the exponent. The algorithm used to produce these two parts is not exactly precise. In other words, you might not get back exactly what you put in. This only becomes a problem when the precision of the number stored approaches the precision specified by the datatype. For example, the precision allowed for floats is up to 38 digits. Floats and reals are useful for scientific and statistical data where absolute accuracy may not be necessary, but where your values range from extremely small to extremely large numbers.
Reals have a precision of 7 digits and require 4 bytes of data storage. If you declare a float and specify a precision of less than 7, you are really creating a real datatype. Floats can have a precision from 1 to 38. By default, a float has a 15-digit precision if no value is specified. You can perform all calculations on a float with the exception of modulo. (Modulo returns the integer remainder of integer division.)
For example, say you create the following datatype:
SomeVal real
You could store the numbers 188,445.2 or 1,884.452, but not the values 188,445.27 or 1,884.4527, as they are longer than the default precision of seven digits. To hold these larger values, you should create a float variable with a precision large enough to hold all of the digits. Since there are eight digits total, you would declare your datatype like this:
SomeVal float(8)
Exact numeric datatypes are decimal and numeric. Accuracy will be preserved to the least significant digit. When you declare an exact numeric datatype, you should specify both a precision and a scale.
If you do not specify a precision and scale, SQL Server will use the default values of 18 and 0. Here is something interesting: If you specify a scale of 0, you are creating the equivalent of an integer datatype, as there can be 0 digits to the right of the decimal point.
If a column is declared as decimal(7,2), it can hold the number 1000.55 and 11000.55, but not the numbers 11110000.55 or 1100.5678. If you attempt to place a number larger than is allowed by the precision and scale, the number will be truncated.
Here is an example of using exact numeric datatypes:
CREATE TABLE tblGold ( AtomicWeight float(8,4) MolesPerOunce float(12,6) Density numeric(5,4) )
There are several datatypes that just don't fall well into any category. They're added here in the special datatypes section. You will look at the bit datatype and the two BLOB (Binary Large Objects) datatypes, which are text and image.
The bit datatype is a logical datatype and is used to store Boolean information. Boolean datatypes are used as flags to signify things like on/off, true/false, and yes/no. The values stored here are either a 0 or a 1.
Bit columns do not allow NULL values and cannot be indexed. Bit datatypes require a single byte of storage space. If you have several bit columns defined in a table, SQL Server will automatically group up to 8 bit fields into a single byte of storage space. Following is an example of using the bit datatype:
Gender bit NOT NULL Paid bit NOT NULL Printed bit NOT NULL
Text and image datatypes are used when storage requirements exceed the 255 character column limitations. These datatypes are often referred to as blobs or binary large objects. The text and image datatypes can store up to 2GB of binary data per-declaration.
When you declare a text or image datatype, a 16-byte pointer will be added to the row. This 16-byte pointer will point to a separate 2KB data page where your data will be stored. If your data exceeds that first 2LB data page, another 16-byte pointer on the data page will point to the next 2KB data page and so on, until your data is saved or you reach the 2-gigabyte limit.
The storage and retrieval of text and image data can seriously hamper your database performance, because large amounts of data will be applied to your transaction logs during inserts, updates, and deletes. You can get around this problem by using the WRITETEXT command, which will apply changes to the data without making a corresponding entry to the transaction log.
NOTE: When you use non-logged operations, you should back up your database immediately, as the recoverability of your database is now at issue.
It would be wise to evaluate alternatives to storing images and large text fields. For example, if your text field is always going to be 500 characters or less, you might split the text field into two 250 character char(250) fields. This will save roughly 1,500 bytes of storage space over declaring a single text datatype, because the text datatype will create a 2KB data page reserved for that declaration. The extra 1,500 bytes will be wasted space.
Another alternative is to store text and image data as separate files, and then store the path to those files in your database.
Here are some example declarations of the text and image datatypes:
EmployeePhoto image ScannedContracts image Description text Comments text
Date and time data can be stored in either a datetime or a smalldatetime datatype.
Date and time data can take several different formats. You can specify the month using the full name or an abbreviation. The case is ignored and commas are optional.
Here are some examples using the alpha formats for April 15, 1996:
Apr 15 1996 Apr 15 96 Apr 96 15 15 Apr 96 1996 April 115 1996 15 April
NOTE: If only the last two digits of the year are given, SQL Server will interpret values of less than 50 as 20yy, while numbers greater-than or equal-to 50 will be interpreted as 19yy. For example, April 15 03 would be interpreted as April 15, 2003.This is different from the Microsoft Visual Basic 5.0 and the Office 95/97 products, which interpret values less than 30 as 20yy and values greater-than or equal-to 30 as 19yy.
You can also specify the ordinal value for the month. The ordinal value of an item is the positional value within a list of items. In the previous examples, April is the fourth month of the year, so you can use the number 4 as its designation.
Here are some examples using the ordinal value for April 15, 1996:
4/15/96 (mdy) 4-15-96 (mdy) 4.15.96 (mdy) 4/96/15 (myd) 15/96/04 (dym) 96/15/04 (ymd)
There are also several different time formats you can use. Here are some examples:
16:30 (4 hrs, 30 mins) 16:30:20:999 (4 hrs, 30 mins, 22 seconds, 999 milliseconds) 4:30PM
Dates stored in the datetime datatype are stored to the millisecond. A total of eight bytes are used, four for the number of days since January 1, 1900 and four for the number of seconds past midnight. (Dates prior to this are stored as negative numbers, making the range of dates 1/1/1753 to 12/31/9999.) The accuracy of your dates is within 3.33 milliseconds.
With the smalldatetime datatype, a total of four bytes are used. Dates stored this way are accurate to the minute. Internally, one smallint (two bytes) is used for the number of days after January 1, 1900, and the other smallint is used for the number of seconds past midnight. The range for a smalldatetime is 1/1/1900 to 6/6/2079.
Table 5.6 delineates the datetime datatypes.
Datatype | Storage | Range |
datetime | 8 | 1/1/1753-12/31/9999 |
smalldatetime | 4 | 1/1/1900-6/6/2079 |
TIP: Use smalldatetime for current dates in databases, especially those that are transitory in nature. These would be dates that you are not going to be using for more than a few years.Do not use datetime to store partial dates, such as only the month, day, or year. If the only data needed is the year, a smallint or tinyint would be much more efficient. If you do not do this, you will have to parse the date yourself every time you want to insert, update, or otherwise work with the information.
There are two money datatypes: money and smallmoney. Both have a scale of 4. These datatypes can store information other than dollar values for international use, but there are no monetary conversion functions available in SQL Server. When you enter monetary data, you should precede it with a dollar sign. Table 5.7 shows you the money datatypes, their requirements, and their ranges.
Datatype | Storage | Range |
money | 8 | ±922,337,203,685,447.5808 |
smallmoney | 4 | ±214,748,3647 |
As you can see, smallmoney can hold up to 10 digits with a scale of 4. The money datatype is large enough to hold the U.S. national debt with values in the hundreds of trillions.
Here are a couple sample declarations using the money datatypes:
AccountsReceivable money AccountsPayable smallmoney
There are two automatically updating datatypes, identity and timestamp. Every time you add a new record to a table with an identity or timestamp field, those values will automatically be added. Timestamps go a bit further, if you make an update to a row. The identity field will retain its current value, but the timestamp will automatically update itself.
The identity field is really a field property that can be assigned to any of the integer datatypes or the approximate numeric datatypes. The identity property will generate incremental values for new rows in your table based on a seed value and an increment (identity(s,i). If you do not specify these values, both will default to 1.
A column defined as the identity column will contain SQL Server-generated values that are unique for each row within a table. The identity field is similar to the auto-number field in a Microsoft Access database.
You are allowed only one identity field per table. The identity property will be discussed in more detail on Day 14, "Data Integrity." Here are some examples of using the identity field:
EmployeeID int identity(2,2) OrderNumber tinyint identity
The creation of a timestamp datatype creates a SQL Server-generated, unique, automatically updated value. Although the timestamp looks like a datetime datatype, it is not. timestamps are stored as binary(8) for NOT NULL columns, or varbinary(8) if the column is marked as NULL. You can have no more than one timestamp field per row.
NOTE: The timestamp datatype does not reflect the system time. It is simply a monotonically increasing counter value.
You can use the timestamp datatype to track the order in which items are added and modified in your table.
Here are some examples using the timestamp datatype:
LastModified timestamp NOT NULL PhoneCall timestamp NOT NULL
To insure that SQL Server datatypes map to ANSI datatypes, you can use the ANSI datatypes in place of the SQL Server datatypes. Table 5.8 lists the ANSI datatypes and their SQL Server equivalents.
ANSI Datatype | SQL Server Datatype |
character | char |
character(n) | char(n) |
char varying | varchar |
character varying(n) | varchar(n) |
binary varying | varbinary |
dec | decimal |
double precision | float |
float(n) n = 1 - 7 | real |
float(n) n = 8 - 15 | float |
integer | int |
User-defined datatypes can be created for a specific database or placed in the model database. Remember, the model database is used as a template for creating new databases. In this fashion, you can have your new user-defined datatype available in all subsequent databases.
To create a user-defined datatype, you must create it using the system-provided datatypes. For example, you can create a new datatype called EmployeeID and define it as character or integer, but not as some non-existent datatype like column_id.
You must create the user-defined datatype before you add it to a table. To create a user-defined datatype, you can use the Enterprise Manager or the sp_addtype stored procedure.
To create a user-defined datatype with the SQL Enterprise Manager, follow these steps:
Figure 5.1. Manage User-Defined Datatypes dialog box.
As you can see in Figure 5.1, three user-defined datatypes have been created on the pubs database: Empid with a char(9) value, id as a varchar(11), and tid with a varchar(6).
To create user-defined datatypes using SQL, you use the sp_addtype stored procedure. For example, to add the same three user-defined datatypes to the pubs database, you could run the following SQL:
sp_addtype empid char(9) NULL sp_addtype id varchar(11) NULL sp_addtype tid varchar(6) NULL
Once you have declared a user-defined datatype, you can use it as many times as you would like in your database. For example, you could use the following CREATE TABLE statement.
CREATE TABLE tblEmployee ( EmployeeId empid NOT NULL, Fname char(15), Lname char(20), PensionPlan id, )
You can use the SQL Enterprise Manager or the sp_droptype stored procedure to drop defined datatypes. In the Enterprise Manager, click your database, select User-Defined Datatypes from the Manage menu. In the Manage User-Defined Datatypes dialog box, click the datatype you want to drop and click Drop.
Using SQL, you can run the sp_droptype stored procedure like this:
sp_droptype empid
NOTE: A user-defined datatype cannot be dropped if it is still in use in a table, or has a rule or default bound to it.
Now that you have seen all of the datatypes available in SQL Server, it's time to put it all together with the CREATE TABLE statement. As with most things in SQL Server, there are two ways to create tables. You can use the SQL Enterprise Manager or you can use SQL scripts. You will first learn more about the CREATE TABLE statement.
CREATE TABLE [database.[owner.]table_name ( column_name datatype [identity|constraint|NULL|NOT NULL] [...] )
Where:
Here is a sample CREATE TABLE statement:
CREATE TABLE employees ( emp_id tinyint IDENTITY NOT NULL, Fname char(15), Lname char(20) NOT NULL, Address1 varchar(30), Address2 varchar(30), city varchar(30), state char(2), zipcode char(10), start_date datetime )
You can visually create your tables by using the SQL Enterprise Manager. To create a table using the Enterprise Manager, follow these steps:
Figure 5.2. The Manage Tables dialog box.
Now that you have your tables created, you can begin the task of determining the table size and row size.
You can easily drop tables in the Enterprise Manager by right-clicking the table and selecting Drop from the context menu. To drop a table using SQL, you should run the DROP TABLE statement. For example, to drop the tblEmployees table, you would run:
DROP TABLE tblEmployees
NOTE: System tables cannot be dropped. Don't delete tables from the pubs sample database (you may want to use them later). However, if you should accidentally do so, you can reinstall pubs. Run the INSTPUBS.SQL script, which is located in the SQL server Install directory (\MSSQL\Install) to recreate your pubs database.
There are three major categories of objects that take space in a database.
If you take the total number of megabytes necessary to store each table, you have a pretty good estimate for the size of the database. After adding the room needed for indexes and other objects (such as stored procedures), you can get a good estimate for the total size needed for the database.
Once the total number of rows and how many rows can fit on each page are known, the size of tables (and thus, databases) can be estimated.
For example, suppose you have 10,000 rows, and you have computed that you can fit 10 rows per data page. That leaves 1,000 data pages (10,000/10), which equates to 2,000KB or 2MB (1000x2KB=2,000KB or 2MB). That particular table, with 10,000 rows, would require approximately 2MB in the database.
Continue to do these type of estimates on each table, and you will have a good estimate about how much space you will need in your database to fit your data.
Each object takes an extent (8x2KB) in the database. Simply add the number of objects, multiply by 16KB, and you will have a good estimate of space required by miscellaneous objects.
For example, suppose your database has 37 miscellaneous objects. Those objects would need at least 592KB (37x16KB) space in the database. Of course, if you have a very large object, such as a stored procedure that is many pages long, it may require a second extent to hold it. Simply count very large objects twice for calculation purposes.
For a complete discussion of indexes and how they work in SQL Server please see Day 13, "Indexing." SQL Server 6.5 Unleashed also has very detailed information on indexes, and exactly how much space they require. The following guidelines may help in your estimates: Clustered indexes take much less room than non-clustered indexes.
The more columns you index on, the larger the index. Multiple indexes per table require more room.
SQL Server has many native datatypes that you can use to create tables. The datatype for each column must be chosen carefully because the choice can impact performance and the need for storage space. In addition, the datatype cannot be changed without dropping and recreating the table.
In addition, SQL Server also allows the creation of user-defined datatypes. The sp_addtype system stored procedure or the Manage User-Defined Datatypes dialog box can be used to create them. They must be based on a native datatype and they must be defined before they are used in a table.
Tables can be created using the SQL Enterprise Manage table editor or with the CREATE TABLE statement.
CREATE TABLE table1 ( id char[10], fname char[15] NULL, lname char[15] NOT NULL, comment varchar[255] )
CREATE TABLE phooey (id char[10] NOT NULL, fname char[15] NULL, lname char[15] NOT NULL, comment varchar[255], notes text, directions varchar[255], house_picture image)
Instructions: First write down the answers to the following. When you are sure your answers are correct, create these objects in the database.
zip_code
phone_number
store_id
fax email
© Copyright, Macmillan Computer Publishing. All rights reserved.