One of the most important responsibilities of a database designer is to correctly define a database table for optimal performance. SQL Server's basic design of a table doesn't define how data is to be accessed or stored physically, beyond the data-type constraints and any referential constraint placed on a column or columns designated as PRIMARY KEY. Instead, SQL Server provides a mechanism of indexes or keys to a table that help SQL Server optimize responses to queries.
Without an index, SQL Server must table scan, or read every row in a table, before it can know the answer to any given query. In large tables this is obviously an expensive option for the server to take. Indexes provide a way for SQL Server to organize pointers to the data required. An index in a database works the same way as an index in a reference book. Like an index in a book, an index in a database is a list of "important" values that have references to pages in the database table containing the information that matches the index value. This allows the database to read from a typically smaller list of index pages that will, in turn, point to the data to answer any given request. n
Indexes are SQL Server's internal method of organizing the data in a table in such a way that it can be retrieved optimally. Optimal, in this case, refers to the quickest way. Indexes are collections of unique values in a given table and their corresponding list of pointers to the pages of data where those values are physically represented in a table.
At a high level, indexes are a shorthand way of the database recording information that it's storing in tables. Indexes are just another kind of object in the database and have storage needs like tables. Just as tables require pages of data to store their rows in, indexes require pages to store their summary data in. The advantage of an index is that, generally, it reduces the number of I/Os required to reach any given piece of data in a table.
When you create an index in SQL Server, you tell the database to scan the table, gather the discrete values in the particular column(s) being indexed, and then write a list of data pages and row identifiers to the index page that match the value being indexed. This allows the server to scan a list of index pages before choosing to scan the whole table, looking for matching data.
Working with Indexes is a test and test again proposition. You're faced with many different variables that you'll need to weigh against one another in terms of performance and the end-user experience. For example, the indexes you create will generally be driven by the data-retrieval habits of the applications on the system. These approaches to gathering information from your tables will certainly be a key factor in which columns you index.
Just when you thought this might be easy, along comes the problem of analyzing only the retrieval situations you'll be supporting. Not only must you consider retrieval but also insertion and update processes. This might seem strange at first, but when you think about what's happening behind the scenes, you'll see that the difference between a query, update, and insert application and an application that is query-only is night and day.
When indexes are updated, that's when the work happens. This is a simple, but very important, rule of thumb. Therefore, if you have a system where you're doing a lot of inserts, you'll do well to limit the indexes that are active on the database tables. When records are inserted or updated, the indexes must be updated as well. In fact, when you update a record in SQL Server, you have the same amount of overhead associated with the transaction as you do with an insert. In general, the magic number on these types of systems is five indexes. As you exceed this number, the performance hit you'll take can be excessive.
On the other hand, if you have a query-centric system, one where few inserts are happening and the focus is more on the analysis of existing information, you can implement all the indexes you need to improve query performance. Your limiting factor in this case becomes disk space, since each index will require additional database space.
If you take the time to create an index, make sure you also take the time to determine what you are targeting with the index. This means that you should understand what types of queries will be made to the database. Specifically, this means understanding what columns will be in the WHERE clauses of the SELECT statements.
As you'll see later, when you create an index, it's created by indicating what fields you want to include in the index. The key is that when SQL Server considers an index, one of the field qualifiers to the SELECT statement must be the first column in the index. If not, the index will simply not be used.
For example, if you have an index on the ZIP code and state fields of your database, in that order, and you query the database for a record based on state, the index will not be used. SQL Server will have to work out a different means of retrieving the row(s) that meet your request. Depending on a number of factors, this can mean a table scan, the enemy of performance.
On the other hand, if your SELECT statement queries only on ZIP code, it can use the index. This is regardless of the fact that the index also includes the state. SQL Server will ignore that component of the index and work on without it. Your application will still benefit greatly from the index.
NOTE: SQL Server manages the decision about what indexes to use to fulfill a given query in a series of pages that summarize the distribution of information in the table. This summary information is how SQL Server determines whether a table scan or an indexed query will be the fastest and best way to satisfy a query.If you create your table with data, this summary information is automatically established for you. If, on the other hand, you create the table and then add data to the table, the summary page will not be updated. In this case, and any case where you're adding, changing, or deleting 10-20% of the information in a given table, you need to update the summary information. You do this by issuing an UPDATE STATISTICS command. The syntax is
Update statistics table [,index]where table is the name of the table you want to base the updated summary information on, and index is an optional argument indicating the specific index you want to update statistics for.
This command is never a destructive step and can help performance significantly by allowing SQL Server to learn from the real data in your database.
First, it's important to understand that you cannot create an index on a column with a datatype of BIT, TEXT, or IMAGE. That said, what general things are helpful to look for in determining what columns to index?
There are some good general rules to follow in looking for good index candidates. (See Table 11.1)
Description | Notes |
Foreign keys | Tables with foreign keys are prime candidates for indexes. There are very few cases where you should not have an index on a primary key. It helps the lookup times immensely. |
Large results-set queries | For the queries that will be returning a large dataset, you should index whenever possible. Note that this typically breaks the rules of good client/server methodology. You want to always be looking for ways to decrease the size of the results set, not support large datasets. Nonetheless, the requirement will undoubtedly arise and you'll want to support it well with an index. |
Order by and support | Columns that are referenced in these clauses will benefit group greatly from an index. |
SQL Server has two methods of creating indexes. There is a graphical method provided in SQL Enterprise Manager and a Transact-SQL interface using the CREATE INDEX statement. Only the table's owner can create an index on a table.
NOTE: SQL Enterprise Manager has a limitation when you create indexes through it that you can't specify a data segment on which to create the index. Moving indexes on to different data segments can significantly improve performance on nonclustered indexes because multiple I/O threads can be used to read the data from the index and data pages concurrently. Use the CREATE INDEX statement in ISQL/w to create an index if you need to specify a segment for the index data.
Creating an Index with SQL Enterprise Manager To create an index using SQL Enterprise Manager, follow these steps:
FIG. 11.1
SQL Enterprise Manager's Explorer view after having just been started. Note that
no server is selected.
FIG. 11.2
You use the Explorer-type view from within the Enterprise Manager to select the table
objects you want to work with.
FIG. 11.3
The authors table is selected in the top-left combo box, and the aunmind index
is selected in the top-right combo box.
FIG. 11.4
The Manage Indexes dialog box is ready to build a new index.
TIP: Prefacing index names with pk for primary key or fk for foreign key makes it easier to identify the index type without having to inspect its properties.
FIG. 11.5
The Index Build message box seeks confirmation of whether the index should be
built immediately or scheduled as a task.
Creating an Index with CREATE INDEX The Transact-SQL command CREATE INDEX is used by SQL Enterprise Manager to perform the index creation when the Build button is clicked in the Manage Indexes dialog box. The syntax for CREATE INDEX is shown in Listing 11.1.
CREATE [UNIQUE] [CLUSTERED | NONCLUSTERED] INDEX index_name ON [[database.]owner.]table_name (column_name [, column_name]...) [WITH [FILLFACTOR = x] [[,] IGNORE_DUP_KEY] [[,] {SORTED_DATA | SORTED_DATA_REORG}] [[,] {IGNORE_DUP_ROW | ALLOW_DUP_ROW}]] [ON segment_name]
The options for the Transact-SQL command CREATE INDEX are covered in the following sections.
UNIQUE If an index is created as UNIQUE, SQL Server disallows duplicate values in the index and, therefore, stops a client from inserting a record into the base table. This is the most common use of an index to enforce integrity on a table. Unique indexes can't be created on tables that have duplicate values in the columns being indexed; the duplicate data must be removed first. If enabled, the IGNORE_DUP_KEY option, described below, allows UPDATE or INSERT statements, affecting several rows, that modify index keys to complete, even if the new index key values become duplicates. The duplicate values will be rolled back and the transaction will continue. No error will be generated.
NOTE: If you define an index as being unique, and you're declaring it on a column that allows NULL values, you'll run into problems as the table becomes populated. This is because SQL Server will allow a single NULL value in the column during indexing, but when the second NULL column value is inserted it will fail. This is because the column will allow only one value (UNIQUE), and in this case, NULLs are consider a distinct and unique value, of which there can be only one.In short, if you're defining a UNIQUE key, be sure that none of the columns that are included in the key allow NULLs.
CLUSTERED A clustered index is a special index that forces SQL Server to store the table data in the exact order of the index. Using a clustered index, to physically store the data in the table in a particular way, can greatly improve access performance to the table. Data requested from tables that are scanned repeatedly, by the index key value for an individual record or set of records in a range, can be found very quickly because SQL Server knows that the data for the index page is right next to it. Any further values are guaranteed to be in the following data pages.
TIP: You should always have an index on the table defined as CLUSTERED. This will greatly improve performance by keeping the database in a managed state, but, moreover, it allows you to reuse space from deleted rows. This is because a clustered index will reclaim space in the index and data pages as new rows are inserted.
If CLUSTERED isn't specified in the CREATE INDEX statement, the index is assumed to be NONCLUSTERED.
There can be only one clustered index per table because the data can be in only one physical order.
CAUTION: When you create a clustered index, since the information in the table is physically moved, the entire table is locked, preventing access from any client application. If you're indexing a production table, consider using the Scheduled Index Build option and run the creation of the index after hours.
CAUTION: Specifying a segment for a clustered index to be placed on will actually move the table data too. Be careful using the ON segment_name keyword when creating a clustered index. You must have approximately 1.2 times the space required for the entire table available on the target segment for the clustered index and data. If you do not, the segment free space will be filled and a new segment will need to be created. SQL Server won't warn you before index creation that the segment space is inadequate because it has no way of knowing or accurately estimating the size required for the index.
NONCLUSTERED This is the default index type and means that SQL Server will create an index whose pages of index data contain pointers to the actual pages of table data in the database. You can create up to 249 nonclustered indexes in a table.
index_name An index name must be unique by table. This means that the same index name can be given to two indexes, provided that they're indexing different base tables in the database. Index names must follow standard SQL Server naming conventions for objects.
table_name table_name is the table that's going to be indexed.
column_name This is the column that's being indexed. If more than one column is placed here, a composite or compound index is created. Multiple columns should be separated by spaces. You can specify up to 16 columns to create a composite key, but the maximum width of the data types being combined can't exceed 256 bytes.
FILLFACTOR = x Specifying a FILLFACTOR on an index tells SQL Server how to "pack" the index data into the index data pages. FILLFACTOR tells SQL Server to preserve space on the index data page for other similar rows that are expected for the same index keys or similar index key values.
CAUTION: The FILLFACTOR should rarely be used. It is included solely for fine-tuning purposes. Even for fine-tuning, it should be used only if future changes in the data can be made with accuracy.
Specifying a FILLFACTOR for frequently inserted database tables can improve performance because SQL Server won't have to split data onto separate data pages when the index information is too big to fit on a single page. Page splitting is a costly operation in terms of I/O and should be avoided if possible.
The number of the FILLFACTOR refers to the percentage of free space that should be preserved on each index page.
A small FILLFACTOR is useful for creating indexes for tables that don't yet contain their complete dataset. For example, if you know that a table is going to have many more values than it does now and you want SQL Server to preallocate space in the index pages for those values so that it won't need to page split, specify a low FILLFACTOR of about 10. A page split occurs when the index fills up so that no further values will fit in the current 2K data page. Consequently, SQL Server "splits" the page in two and puts references to the newly created page in the original page.
A high FILLFACTOR will force more frequent page splits because SQL Server will have no room on the index page to add any additional values that may be necessary if a record is inserted into the table. A FILLFACTOR of 100 will force SQL Server to completely fill the index pages. This option is good for highly concurrent, read-only tables. It's inadvisable, however, for tables that are inserted or updated frequently. Every insert will cause a page split, and many of the updates, if key values are updated, will also cause page splits.
If no FILLFACTOR is specified, the server default, usually 0, is used. To change the server default, use the system-stored procedure sp_configure.
See "Displaying and Setting Server Options," Chapter 13.
Be careful when specifying a FILLFACTOR for a clustered index. It will directly affect the amount of space required for the storage of the table data. Because a clustered index is bound to a table, as the physical order of the table data is mapped to the order of the clustered index, a FILLFACTOR on the index will space each data page of the table apart from the others according to the value requested. This can consume substantial amounts of disk space if the FILLFACTOR is sparse.
NOTE: Specifying a FILLFACTOR when creating an index on a table without data has no effect because SQL Server has no way of placing the data in the index correctly. For tables that have dynamic datasets that need to be indexed with an index specifying a FILLFACTOR, you should rebuild indexes periodically to make sure that SQL Server is actually populating the index pages correctly.
IGNORE_DUP_KEY When SQL is executed, this option controls SQL Server's behavior that causes duplicate records to exist in a table with a unique index defined on it. By default, SQL Server will always reject a duplicate record and return an error. This option allows you to get SQL Server to continue processing as though this isn't an error condition. This configuration option can be useful in highly accessed tables where the general trend of the data is more important than the actual specifics. It shouldn't be used for tables where each individual record is important, however, unless application code is providing appropriate referential constraints to the data. If multiple records are affected by an update or insert statement, and the statement causes some records to create duplicates, the statement will be allowed to continue. Those records that created duplicates will be rolled back with no error returned.
CAUTION: When enabling IGNORE_DUP_KEY, be careful that you don't lose required data due to unwanted updates occurring. If IGNORE_DUP_KEY is enabled for a unique index and an update is done to data that causes duplicate records to exist, not only will the duplicates be rejected by the update but the original records will also be removed. This is because SQL Server performs updates by deleting the record and then reinserting it. The reinsertion will fail, due to the duplicity of the record, so neither the original record nor the updated record will exist.
SORTED_DATA SQL Server uses the SORTED_DATA keyword to speed up index creation for clustered indexes. By specifying SORTED_DATA, you're telling SQL Server that the data to be indexed is already physically sorted in the order of the index. SQL Server will verify that the order is indeed correct during index creation by checking that each indexed item is greater than the previous item. If any item isn't found to be sorted, SQL Server will report an error and abort index creation. If this option isn't specified, SQL Server will sort the data for you as it would do normally. Using the SORTED_DATA keyword greatly reduces the amount of time and space required to create a clustered index. The time is reduced because SQL Server doesn't spend any time ordering the data. The required space is reduced because SQL Server no longer needs to create a temporary workspace in which to place the sorted values before creating the index.
SORTED_DATA_REORG SORTED_DATA_REORG is similar to SORTED_DATA in that it helps SQL Server's overall performance by making the data physically reside in the database table in the order of the index. The SORTED_DATA_REORG keyword tells SQL Server to physically reorder the data in the order of the index. This can be especially useful on nonclustered indexed tables on which you want to reduce the number of page splits due to data no longer being in adjacent data pages. This will help the data be physically adjacent in the database and will reduce the number of nonsequential physical I/Os required to fetch data, which in turn improves performance.
IGNORE_DUP_ROW This option is for creating a nonunique clustered index. If enabled at index creation time on a table with duplicate data in it, SQL Server will:
If data is inserted into or updated in the table after the index is created, SQL Server will:
See the section titled "ROLLBACK TRAN," in Chapter 12, for more information on rolling back transactions.
NOTE: This option has no effect on nonclustered indexes. SQL Server internally assigns identifiers to the records being indexed and doesn't have to manage the physical order of the data according to the clustering.
ALLOW_DUP_ROW This option can't be set on an index that's allowed to IGNORE_DUP_ROW. It controls behavior for inserting or updating records in a nonunique clustered index. If ALLOW_DUP_ROW is enabled, no errors are returned and no data is affected if multiple duplicate records are created in a clustered index.
ON segment_name Specifying a segment for the index to reside on allows the placement of an index on a different segment of the data. This will improve performance of nonclustered indexes because multiple I/O handlers can be used to read and write from the index and data segments concurrently. Clustered indexes that have a segment name specified will move the data that's being indexed, as well as the index, to the indicated segment.
The value of an index for helping SQL Server resolve a query largely depends on how accurately the index's data reflects the actual data in the database. SQL Server maintains heuristical, or trend, statistics on the data that the index contains to help it choose the appropriate index that will yield the least number of I/Os to get to the actual table data. Clustered indexes, when available, will almost always be chosen as a valid key over nonclustered indexes. Clustered indexes are favored because there is no physical I/O after the data is found in the index since it's on the same page.
SQL Server's statistics-gathering engine is on an as-needed basis, in that the statistics are maintained only when an index is built or the statistics on those indexes are forced to be updated by the UPDATE STATISTICS statement. SQL Server doesn't maintain statistics on the fly purely for performance reasons. The additional overhead of maintaining the statistics dynamically generally isn't considered advantageous to fetching the data because it will consume on average more resources than would have been saved by the additional data. SQL Server's statistics indicate trends in the data and don't necessarily represent every key data element. These trends are what SQL Server uses to determine the best index to use.
The exception to when dynamic index statistics management would be beneficial is when an index is created on a table with no data in it. SQL Server has no way of knowing what the trends in the data are and, as a result, makes very basic assumptions of normal distribution. These assumptions are often very wrong after a number of records are added to the table. This results in SQL Server doing a table scan even though there's an appropriate index. In this situation, perform an UPDATE STATISTICS on the table, and the index's distribution information will be updated.
To determine the last time statistics information was updated on a given index, the DBCC command SHOW_STATISTICS can be used. An example of its use as shown in Listing 11.2:
/*----------------------------- dbcc show_statistics( authors, fk_au_id ) ------------------------------*/ Updated Rows Steps Density -------------------- ----------- ----------- --------- Dec 10 1995 2:20PM 23 22 0.0434783 (1 row(s) affected) All density Columns ------------------------ ------- 0.0434783 au_id (1 row(s) affected) Steps ----------- 172-32-1176 213-46-8915 238-95-7766 267-41-2394 274-80-9391 341-22-1782 409-56-7008 427-17-2319 472-27-2349 486-29-1786 527-72-3246 648-92-1872 672-71-3249 712-45-1867 722-51-5454 724-08-9931 724-80-9391 756-30-7391 807-91-6654 846-92-7186 893-72-1158 899-46-2035 (22 row(s) affected) DBCC execution completed. If DBCC printed error messages, see your System Administrator.
In the output from the DBCC SHOW STATISTICS command, you can see that the statistics have been kept up-to-date as of December 10th. In the second example, shown in Listing 11.3, the authors2 table is re-created with the same structures as authors, and the data is copied to it.
CREATE TABLE authors2 ( au_id id NOT NULL , au_lname varchar (40) NOT NULL , au_fname varchar (20) NOT NULL , phone char (12) NOT NULL , address varchar (40) NULL , city varchar (20) NULL , state char (2) NULL , zip char (5) NULL , contract bit NOT NULL ) GO CREATE INDEX aunmind ON dbo.authors2(au_lname, au_fname) WITH FILLFACTOR = 5 GO CREATE UNIQUE INDEX barny ON dbo.authors2(au_id) WITH IGNORE_DUP_KEY GO CREATE UNIQUE INDEX fk_au_id ON dbo.authors2(au_id) WITH IGNORE_DUP_KEY GO insert into authors2 select * from authors GO
The previous listing was generated by doing a SQL script generation using Microsoft SQL Enterprise Manager and then changing the table name from authors to authors2. The same DBCC command reports very different results, because the STATISTICS are out of date as shown in Listing 11.4 :
/*----------------------------- dbcc show_statistics( authors2, fk_au_id ) -----------------------------*/ Updated Rows Steps Density -------------------- ----------- ----------- ------- NULL 23 0 0.0 (1 row(s) affected) All density Columns ------------------------ ------- 0.0 au_id (1 row(s) affected) Steps ----------- (0 row(s) affected) DBCC execution completed. If DBCC printed error messages, see your System Administrator.
Performing an UPDATE STATISTICS on the authors2 table and then doing the DBCC command yields the identical results to the initial DBCC on the authors table, shown in Listing 11.5:
/*------------------------------ update statistics authors2 go dbcc show_statistics( authors2, fk_au_id ) ------------------------------*/ Updated Rows Steps Density -------------------- ----------- ----------- --------- Dec 11 1995 12:08PM 23 22 0.0434783 (1 row(s) affected) All density Columns ------------------------ ------- 0.0434783 au_id (1 row(s) affected) Steps ----------- 172-32-1176 213-46-8915 238-95-7766 267-41-2394 274-80-9391 341-22-1782 409-56-7008 427-17-2319 472-27-2349 486-29-1786 527-72-3246 648-92-1872 672-71-3249 712-45-1867 722-51-5454 724-08-9931 724-80-9391 756-30-7391 807-91-6654 846-92-7186 893-72-1158 899-46-2035 (22 row(s) affected) DBCC execution completed. If DBCC printed error messages, see your System Administrator.
Updating Statistics SQL Server has two ways of updating the statistics that relate to a table. The graphical way can be performed using SQL Enterprise Manager. The Transact-SQL command is UPDATE STATISTICS.
Using SQL Enterprise Manager to Update Statistics To use SQL Enterprise Manager to update statistics on a table, follow these steps:
FIG. 11.6
The authors table is selected in the Table box, and the aunmind
index is selected in the Index box.
FIG. 11.7
This dialog box uses the output of DBCC SHOW_STATISTICS to display the information.
FIG. 11.8
The Update Distribution Statistics message box confirms whether the index statistics
should be updated immediately or scheduled as a task.
By default, only the index highlighted in the Manage Indexes dialog box will have its statistics updated. If you want to update all of the indexes on the table selected, check the option Apply to ALL Indexes Of check box before either scheduling the task or executing it.
Using UPDATE STATISTICS The UPDATE STATISTICS statement is used to update the index statistics on a table or index. The syntax for UPDATE STATISTICS is as follows:
UPDATE STATISTICS [[database.]owner.]table_name [index_name]
The options for the Transact-SQL command UPDATE STATISTICS are as follows:
NOTE: Performing an UPDATE STATISTICS on a database table can affect the plan that a stored procedure has generated for accessing data. Because stored procedures are compiled and stored in the procedure cache when they're first executed, they can store invalid access paths to data based on the index statistics at the time the procedure was first run. To force a stored procedure to refresh its access path, use the system-stored procedure sp_recompile and pass the table that was updated as a parameter. For example, sp_recompile authors will force all the procedures that use the authors table to be recompiled the next time they're executed.
Forcing the Use of a Particular Index If SQL Server fails to pick an index that you know should provide better performance than the index it chose, you can force the use of an index by specifying it in the FROM clause. To force an index, use the optimizer hints or (INDEX = ...) section of the SELECT statement's syntax. In simplified syntax, here's a SELECT statement:
SELECT ... FROM table_name (INDEX = n) /* optimizer hints are placed after the table */ ...
See Chapter 7, "Retrieving Data with Transact-SQL," for more information on the syntax of SELECT statements.
The INDEX keyword tells SQL Server to use the index specified by the numeric n. If n equals zero (0), SQL Server will table scan. If n equals one (1), SQL Server will use the clustered index if one is in the table. The other values of n are determined by the number of indexes on the table.
TIP: An index name can also be used in the optimizer hint instead of an identifying id number.
Listing 11.6 shows SQL Server using the optimizer hints when selecting from the authors table.
/* Turn on statistics IO, so that the results can be seen */ set statistics io on go /* Basic Select with no hints to show the optimizer choosing the clustered index */ Select AU_ID, AU_FNAME From AUTHORS Where AU_ID between `172-32-1176' and `238-95-7766' Order By AU_ID go /* Force a table scan */ Select AU_ID, AU_FNAME From AUTHORS (INDEX = 0) Where AU_ID between `172-32-1176' and `238-95-7766' Order By AU_ID go /* Force the clustered index */ Select AU_ID, AU_FNAME From AUTHORS (INDEX = 1) Where AU_ID between `172-32-1176' and `238-95-7766' Order By AU_ID go /* Force the first alternate index */ Select AU_ID, AU_FNAME From AUTHORS (INDEX = 2) Where AU_ID between `172-32-1176' and `238-95-7766' Order By AU_ID go The output is as follows: AU_ID AU_FNAME ----------- -------------------- 172-32-1176 Johnson 213-46-8915 Marjorie 238-95-7766 Cheryl (3 row(s) affected) Table: authors scan count 1, logical reads: 1, physical reads: 0 AU_ID AU_FNAME ----------- -------------------- 172-32-1176 Johnson 213-46-8915 Marjorie 238-95-7766 Cheryl (3 row(s) affected) Table: authors scan count 1, logical reads: 1, physical reads: 0 Table: Worktable scan count 0, logical reads: 4, physical reads: 0 AU_ID AU_FNAME ----------- -------------------- 172-32-1176 Johnson 213-46-8915 Marjorie 238-95-7766 Cheryl (3 row(s) affected) Table: authors scan count 1, logical reads: 2, physical reads: 0 AU_ID AU_FNAME ----------- -------------------- 172-32-1176 Johnson 213-46-8915 Marjorie 238-95-7766 Cheryl (3 row(s) affected) Table: authors scan count 1, logical reads: 29, physical reads: 0 Table: Worktable scan count 0, logical reads: 4, physical reads: 0
CAUTION: The effects of forcing an index are clearly shown in these examples. The last example shows an extremely expensive option being forced on the server. You can cause major performance problems by forcing index use, so it's generally not recommended that you update the indexes. Forcing index selection in a query is also dangerous if the application code is left unchanged and the indexes are changed or rebuilt. Changing the indexes may cause severe performance degradation due to the forcing of indexes that no longer provide optimal performance.
SQL Server has two ways to show information about indexes. The graphical method is via SQL Enterprise Manager's Index Manager. The command-line method is via the system-stored procedure sp_helpindex and the ODBC stored procedure sp_statistics.
SQL Enterprise Manager's Index Manager has been discussed in detail in previous sections in this chapter. Please refer to the section "Using SQL Enterprise Manager to Update Statistics" for instructions on how to view the statistics associated with an index.
sp_helpindex The system-stored procedure sp_helpindex has been provided to get information about indexes. The syntax for the procedure's use is
sp_helpindex table_name
table_name should be replaced with an unqualified table name. If the table you want to inquire on isn't in the active database, you must change to the required database before executing this procedure.
sp_helpindex will return the first eight indexes that are found on a database table. In Listing 11.7, sp_helpindex shows all the indexes on the authors table:
/*---------------------------- sp_helpindex authors ----------------------------*/ index_name index_description index_keys -------------------------------------- UPKCL_auidind clustered, unique, primary key located on default au_id aunmind nonclustered located on default au_lname, au_fname barny nonclustered, ignore duplicate key, unique located on default au_id fk_au_id nonclustered, ignore duplicate key, unique located on default au_id (1 row(s) affected)
sp_statistics sp_statistics is a special stored procedure that has been created to help Microsoft "publish" information for the ODBC interface to the database. Microsoft created this stored procedure so that an ODBC driver could retrieve all the relevant information about an index from a single call to the database. The information returned can be gathered in a number of other ways, but it's often convenient to use sp_statistics to summarize all the relevant information on a table. The syntax for sp_statistics is as follows:
sp_statistics table_name [, table_owner] [, table_qualifier] [, index_name] [, is_unique]
The options for the system-stored procedure sp_statistics are as follows:
TIP: Many stored procedures have many parameters. To save time, rather than specify all the parameters, you can indicate a particular one by placing an @ sign in front of the parameter name. An example is sp_statistics authors, @is_unique = `Y'
SQL Server has two ways of dropping indexes on a table. The graphical way can be performed by using SQL Enterprise Manager. The command-line way is by using the SQL statement DROP INDEX.
Using SQL Enterprise Manager to Drop an Index To use SQL Enterprise Manager to drop an index, follow these steps:
FIG. 11.9
The Index Removal message box confirms whether you want the index removed immediately
or scheduled as a task.
Using the DROP INDEX Command To remove an index using Transact-SQL, use the DROP INDEX statement. The syntax for DROP INDEX is as follows:
DROP INDEX [owner.]table_name.index_name [, [owner.]table_name.index_name...]
The options for the Transact-SQL command DROP INDEX are as follows:
The following example drops the barny index on the authors table:
Drop Index authors.barny
No output is generated after executing this command.
Keys and indexes are often synonymous in databases, but in SQL Server a slight difference exists between them. In SQL Server, keys can be defined on tables and then used as referential integrity constraints in the same way as the ANSI standard for SQL.
A primary key is a unique column or set of columns that defines the rows in the database table. In this sense, a primary key performs the same integrity role as a unique index on a table. Keep in mind, though, that SQL Server allows only one primary key to be defined for a table. On the other hand, there can be many unique indexes. Primary keys enforce uniqueness by creating a unique index on the table on which they're placed.
Foreign keys are columns in a table that correspond to primary keys in other tables. The relationship of a primary key to a foreign key defines the domain of values permissible in the foreign key. The domain of values is equivalent to a distinct list of values in the corresponding primary key. This foreign key domain integrity is a useful way of enforcing referential integrity between associated sets of columns. Foreign keys don't create indexes on the table when the key is created.
Starting in Version 6.0, primary and foreign keys in SQL Server offer much of the functionality that previously had to be coded with triggers in prior versions of SQL Server. In prior versions of SQL Server, primary and foreign keys weren't much more than documentation and were useful to third-party programs that needed to know key information about a table. Keys provide needed functionality and should be used as a referential integrity enforcer.
In SQL Server you can add primary and foreign keys in two ways. The graphical method is performed by using SQL Enterprise Manager. The command-line method is done by using Transact-SQL commands ALTER TABLE...ADD CONSTRAINT, or by specifying PRIMARY/FOREIGN KEY in the CREATE TABLE statement.
Using SQL Enterprise Manager to Add Primary and Foreign Keys To use SQL Enterprise Manager to add a primary key, follow these steps:
FIG. 11.10
The authors table is selected in the top combo box, and a key icon is in the Key
column of the au_id row.
FIG. 11.11
This figure shows SQL Enterprise Manager's Manage Tables window with the Advanced
Features visible and the Primary Key/Identity tab active.
FIG. 11.12
The Primary Key has been reentered and the Add button is now
enabled to allow you to create the Primary Key on the table.
Using CREATE TABLE...PRIMARY KEY The CREATE TABLE syntax has a place for adding a PRIMARY KEY or a FOREIGN KEY in the CONSTRAINT section. A simplified syntax of the CREATE TABLE is shown as follows:
CREATE TABLE table_name ( column_name data_type CONSTRAINT ...,...)
In Listing 11.8, you'll see tables created in different styles with different types of CONSTRAINTs.
/* create a table where the primary key name is not specified, and the database will assign it */ Create TABLE TABLE_A ( COLUMN_A smallint PRIMARY KEY) go /* Now create a primary key specifying the name */ Create TABLE TABLE_B ( COLUMN_B smallint CONSTRAINT PK_COLUMN_B PRIMARY KEY) go /* Now create a foreign key referencing TABLE_A */ Create TABLE TABLE_C ( COLUMN_C smallint FOREIGN KEY (COLUMN_C) REFERENCES TABLE_A(COLUMN_A)) go /* Now Create a multi-column primary key */ Create TABLE TABLE_D ( COLUMN_D1 smallint CONSTRAINT PK_D_COLUMNS PRIMARY KEY (COLUMN_D1, COLUMN_D2), COLUMN_D2 smallint) go /* now create a foreign key referencing the multi-column primary key */ Create TABLE TABLE_E ( COLUMN_E1 smallint FOREIGN KEY (COLUMN_E1, COLUMN_E2) REFERENCES TABLE_D( COLUMN_D1, COLUMN_D2), COLUMN_E2 smallint) go
NOTE: When you add a PRIMARY KEY to a table with the ALTER TABLE...ADD CONSTRAINT syntax or in the CREATE TABLE statement, if you don't indicate any parameters for the key, a clustered, unique index is created on the table. To specify a nonclustered index, add NONCLUSTERED immediately after PRIMARY KEY to the statement.
Using ALTER TABLE...ADD CONSTRAINT The ALTER TABLE...ADD CONSTRAINT syntax is very similar to the CREATE TABLE logic. In Listing 11.9, the same tables are created, but the ALTER TABLE syntax is used to add the keys.
/* create the table */ Create TABLE TABLE_A ( COLUMN_A smallint) go /* add the basic primary key without specifying the name */ Alter Table TABLE_A ADD PRIMARY KEY (COLUMN_A) go /* create the table */ Create TABLE TABLE_B ( COLUMN_B smallint) go /* add the primary key specifying the name */ Alter Table TABLE_B ADD CONSTRAINT PK_COLUMN_B PRIMARY KEY (COLUMN_B) go /* create the table */ Create TABLE TABLE_C ( COLUMN_C smallint) go /* Now create a foreign key referencing TABLE_A */ Alter Table TABLE_C ADD FOREIGN KEY (COLUMN_C) REFERENCES TABLE_A(COLUMN_A) go /* create the table */ Create TABLE TABLE_D ( COLUMN_D1 smallint, COLUMN_D2 smallint) go /* Now add the multi-column primary key */ Alter Table TABLE_D ADD CONSTRAINT PK_D_COLUMNS PRIMARY KEY (COLUMN_D1, COLUMN_D2) go /* create the table */ Create TABLE TABLE_E ( COLUMN_E1 smallint, COLUMN_E2 smallint) go /* now add the foreign key referencing the multi-column primary key */ Alter Table TABLE_E ADD CONSTRAINT FK_E_COLUMNS FOREIGN KEY (COLUMN_E1, COLUMN_E2) REFERENCES TABLE_D( COLUMN_D1, COLUMN_D2) go
NOTE: Microsoft SQL Server 6.5 adds two new options to the ALTER TABLE...ADD CONSTRAINT syntax. These are WITH CHECK | NOCHECK, and NOT FOR REPLICATION. The WITH NOCHECK option is provided so that a constraint can be added without checking the existing data for referential integrity constraints.Microsoft added the NOT FOR REPLICATION option to allow replication to occur without requiring constraints to be dropped and re-added after the replication took place.
See "Creating and Using Constraints," for more information on constraints, Chapter 6.
SQL Server has two ways to show information about keys. The graphical method is via SQL Enterprise Manager's Table Manager. The command-line method is via the system-stored procedures sp_help and sp_helpconstraints, and the ODBC stored procedures sp_pkeys and sp_fkeys.
SQL Enterprise Manager's Table Manager has been discussed in detail in previous sections in this chapter. Please refer to the section titled "Using SQL Enterprise Manager To Add Primary And Foreign Keys" for information on how to view the constraints on a table.
sp_helpconstraint SQL Server's primary way of displaying information about keys is through the system-stored procedure sp_helpconstraint. Its syntax is as follows:
sp_helpconstraint table_name
sp_help sp_help is a generic system-stored procedure that returns information about database tables. Part of the output from sp_help is information on keys on a table. The syntax for sp_help is
sp_help table_name
sp_pkeys and sp_fkeys SQL Server provides two system-stored procedures, sp_pkeys and sp_fkeys, that can be used to view key information stored in the database. sp_pkeys and sp_fkeys are procedures that have been created to help ODBC implementers access SQL Server's system catalog tables easily. The syntax for the two procedures is identical and is as follows:
sp_pkeys | sp_fkeys table_name
table_name is the table for which the keys need to be found.
Examples of Using System-Stored Procedures to View Primary and Foreign Keys Listing 11.10 shows some examples of the output from sp_pkeys, sp_fkeys, and sp_help:
/*----------------------------- sp_helpconstraint TABLE_D -----------------------------*/ Object Name ------------------------- TABLE_D constraint_type constraint_name constraint_keys ------------------------ ----------------- -------------------- PRIMARY KEY (clustered) PK_D_COLUMNS COLUMN_D1, COLUMN_D2 Table is referenced by ------------------------------------------------------------------ pubs.dbo.TABLE_E: FK_COLUMNS /*----------------------------- sp_help table_d -----------------------------*/ Name Owner Type When_created -------------------------------------------------- TABLE_D dbo user table Dec 11 1995 7:42PM Data_located_on_segment ------------------------------ default Column_name Type Length Prec Scale Nullable ------------------------------------------------------------------------ COLUMN_D1 smallint 2 5 0 no COLUMN_D2 smallint 2 5 0 no Identity Seed Increment ----------------------------- No identity column defined. (null) (null) index_name index_description index_keys -------------------------------------------- PK_D_COLUMNS clustered, unique, primary key located on default COLUMN_D1, COLUMN_D2 constraint_type constraint_name constraint_keys ------------------------------------------------------ PRIMARY KEY (clustered) PK_D_COLUMNS COLUMN_D1, COLUMN_D2 Table is referenced by ---------------------- pubs.dbo.TABLE_E: FK_COLUMNS /*----------------------------- sp_pkeys table_d -----------------------------*/ table_qualifier table_owner table_name column_name key_seq pk_name --------------------------------------------------------------------- pubs dbo TABLE_D COLUMN_D1 1 PK_D_COLUMNS pubs dbo TABLE_D COLUMN_D2 2 PK_D_COLUMNS (2 row(s) affected) /*----------------------------- sp_fkeys table_d -----------------------------*/ pktable_qualifier pktable_owner pktable_name pkcolumn_name fktable_qualifier fktable_owner fktable_name fkcolumn_name key_seq update_rule delete_rule fk_name pk_name ------------------------------ -------------------------- -------------- pubs dbo TABLE_D COLUMN_D1 pubs dbo TABLE_E COLUMN_E1 1 1 1 FK_COLUMNS PK_D_COLUMNS pubs dbo TABLE_D COLUMN_D2 pubs dbo TABLE_E COLUMN_E2 2 1 1 FK_COLUMNS PK_D_COLUMNS
SQL Server has two methods for dropping primary and foreign keys. The graphical method is performed by using SQL Enterprise Manager. The command-line method is done by using the Transact-SQL command ALTER TABLE...DROP CONSTRAINT.
sing SQL Enterprise Manager To use SQL Enterprise Manager to drop a key, follow these steps:
Using ALTER TABLE...DROP CONSTRAINT To drop a foreign key using SQL, use the ALTER TABLE...DROP CONSTRAINT statement. The syntax for this SQL statement is as follows:
ALTER TABLE table_name DROP CONSTRAINT constraint_name
The table_name is the name of the table that the constraint applies to. The constraint_name is the name of the constraint.
NOTE: You can't drop a primary key if other tables reference it as a foreign key. You must drop those foreign keys first.
If ever there was a list of double-edged swords, indexes are likely to be very close to the top of the list relating to SQL Server configuration and management. The typical system cycle appears to be that of creating an index to individually satisfy each and every select statement, then realizing that performance is taking a serious hit from the indexing. From that point on, the discussion, and often heated debate, about indexes comes around to the fewer the better, but make the ones implemented count.
One thing to keep an eye on is the use of the update statistics option. In the first part of this chapter, this option was covered and it was explained that the statistics managed by SQL Server are only as good as the data loading at the time the index is created. When you implement a system, you'll typically do so with either empty database tables or a sample database set. Then, as the system is accepted, the databases grow significantly in size.
The model data you had in the system for testing may, in actuality, bear little resemblance to the real-life data in the system. From the pattern of information entered, to the values in the columns, there's nothing like real-life use to determine real-life data patterns. After you've had the system up for a bit, make sure you update statistics. This is critical. In many systems we've done in our integration work, this step alone boosts performance dramatically. Consider making the update process a scheduled process, perhaps once every six months at sites where data distribution is relatively stable, more often at sites where the mix of information is more dynamic.
If you're creating a system that is built to support more query options, consider indexing every column on which you allow queries. Now, before a bunch of e-mail is sent out, be sure that systems you consider this option for are query-only type systems, not update systems. As indicated earlier, query-related systems don't suffer from too many different indexes. Don't do this on tables where you are inserting information as well as updating it.
In this chapter, you learned how to create, view, and manage indexes on your data tables. This information is very important to help you create an optimized database that won't be bogged down by user queries that force table scans.
From here, you should look at the following chapters for more information:
© Copyright, Macmillan Computer Publishing. All rights reserved.