Teach Yourself Microsoft SQL Server 6.5 in 21 Days

Previous chapterNext chapterContents


Day 13

Indexing


On Day 12, "Data Modification," you looked at data modification, including INSERT, UPDATE, and DELETE. These are the commands that allow you to add data to the database, as well as modify and eliminate it. All of these operations are necessary for the normal day-to-day work of SQL Server.

Today's lesson focuses on indexes. Indexes provide a set of logical pointers to your data, as much as an index in the back of a book helps you find things you're looking. Although all the queries you have seen so far work without indexes (select, insert, update, delete), they usually run faster with them.

Index Basics

Why index? Well, there can be many reasons. The most obvious is speed. Without indexes, SQL Server accesses data by reading every page of data on each table you have specified in your SQL statement. This table scan (reading every page of the data) can be an excellent method of data retrieval. For instance, if a table is small, or if you are accessing a large portion of the table, a table scan may very well be the best plan to access the data. However, quite frequently, data access is much faster with an index. It can also speed up joins between tables.

Enforcing uniqueness is another reason to create an index. It is not an error condition to have two rows in a table that are exactly the same. However, that's probably not how most people want to store data. Imagine a system that keeps track of customers. If you couldn't tell your customers apart, you might have difficulty keeping customers. You may have several options to uniquely identify your customers. You could give them numbers, use their names and birthdates together, their credit card number, or some other value or set of values. Regardless of the choice you've made, the way to tell SQL Server about your choice is with a unique index. Day 14, "Data Integrity," discusses another way to enforce uniqueness: the unique constraint (but even then SQL Server is creating a unique index).

Index Structures

An index typically consists of a set of pages known as a B+ tree. A B+ tree lookd something like that shown in Figure 13.1.

Figure 13.1. A B+ tree.

As mentioned, an index is the way data is found quickly. To find an individual row of data, navigate the B+ tree to find that row, and then move on to the individual data row. Start with the root page. A pointer to the root page is found in the sysindexes table (oddly enough, in the column named root for nonclustered indexes). The root page contains index entries (the data for the column or columns you've indexed), as well as pointers to each page below the root page. Each index may have one or more intermediate levels. Again, each entry would have an index value, and a pointer to the next page.

You find an entry for every row in the table being indexed on the leaf pages, as well as a pointer to the data page and row number that has the rest of the columns in the table.

The data itself is stored on pages, which are called data pages (there's no sense in making this hard). Each page is 2,048 bytes in size, with a header of 32 bytes. Hence, each page has 2,016 bytes available for storage. Each of the above pages has the same basic structure.

Figure 13.2 displays an example of what an index might look like. This index is on a first name column.

Figure 13.2. A sample B+ index.

Each level of the index is a doubly linked list. Each page knows about the page before it, and the page that is logically after it. On the root and intermediate levels of the index, each index value is the first value on the page of the next level below it. On the leaf level of the index, you find one entry for each row in the table. The index is sorted based on the column you have chosen as your index key. This does not change the data's physical sort order.

When you modify data in the table, every index on that table is also modified. SQL Server guarantees consistency between the data in your tables and their indexes. This is good in the sense that you want to have excellent data integrity. However, it also means that INSERT, UPDATE, and DELETE operations that may have been previously quick, now may have more work to do. If you want to add a new row with the INSERT statement, it normally takes two I/Os (one for the data and one for the log). If you have two indexes on the table, it takes at least two more, and perhaps more than that. You must balance your needs between data modifications and faster queries.

How do you use this? Use Figure 13.2 to find a row of data. Imagine the index in the figure is on first name. Look on the root page to find the row with a first name of Donald. Because Donald is "less" than John, you follow the entry for Andy to page 6. On page 6 you find that Donald is "higher" than David but "less" than Hank, so you go to page 22. On page 22, you find that the entry for Donald points to page 1 (which is a data page, because this is the index's leaf level). In SQL Server, you also find the number of the row that contains Donald as the first name (it was left off the figure for simplicity's sake). Now read the row from page 1, and you can access any information about Donald that is stored on the page. This is exactly how SQL Server uses an index to find data.

Index Options

There are a number of options for indexes available to you. You need to specify these choices and understand them before you can create your indexes.

Clustered Indexes

There are two options for physical storage of your indexes. The first type is known as clustered. A clustered index is one that physically re-sorts the data. Rather than having a completely separate index structure (such as the one previously described), the leaf level of the index is the data. Accessing data using a clustered index is almost always faster than using a non-clustered index, because the additional data lookup from the leaf level of the index is not necessary.

Because the data is physically sorted in the order of the index key, you can only have one clustered index on a table (you wouldn't want to keep multiple copies of the data). Because you only have one clustered index available, you must choose it carefully. Choosing a clustered index can be an fairly complex topic, but some basic guidelines are included here.

Free space is one major issue that comes up with clustered indexes. Creating a clustered index requires that at least 120 percent of the table's size be available as temporary workspace. This free space must exist in the database where you are creating the index. To create the index, SQL Server copies the table, sorts the copy in the order of the index values (in ascending order), builds the index structures (the root page and any needed intermediate pages), then drops the original table and points the entries in the sysindexes table to the new structure. When this operation is complete, the clustered index only takes about 5 percent more space than the table itself. The overhead (the non-data pages) of this type of index is relatively small.

Because you only need the free space during the index creation, this may not sound too bad--but can be difficult to justify. At least 120MB of unused database space is required if you have a 500MB database, but one table in the database is 100MB, and you want to create a clustered index on that table.

Figure 13.3 shows an example of a clustered index. Notice that the data is sorted in the order of the index key (first name again), and that the data on the leaf level of the index (the data page) is sorted by first name.

Figure 13.3. A clustered index example.

Non-Clustered Indexes

A non-clustered index is basically the same as a standard B+ tree index. Each index has a root page, one or more levels of intermediate pages, and a leaf level, which contains one row for each row in the table. Non-clustered indexes require more space overall than clustered indexes, but take much less space to create.

You can have up to 249 non-clustered indexes on a single table. The order that you create them is insignificant. When you create a non-clustered index, it does not change the order of the data, as does a clustered index. The rows in the leaf level of the index are sorted in the order of the columns chosen as part of the index. Each row contains a pointer to the page number/row number combination of the data in the table. Refer to Figure 13.2 to see an example of a non-clustered index.

Unique/Non-Unique Indexes

Uniqueness determines whether duplicate values are allowed in your index. For instance, in the previous firstname index example, no two people are allowed to have the same first name if the index is unique. The default for SQL Server indexes are non-unique, meaning that duplicate values are allowed.

If your data supports it, making an index unique can significantly improve performance when using that index. No more index searching is necessary after the value you are searching for is found (because you know there is only one entry, you can stop looking for more after you've found it).

Single Column/Multi-Column Indexes

Many indexes have only one column; however, it is easy to create a multi-column index. Multi-column indexes can be quite useful, because SQL Server can typically only use one index per table during a single query. If you specify both columns together, they are an excellent candidate for what's known as a composite index (another name for an index with multiple columns). Composite indexes can be clustered or non-clustered. Composite indexes can contain from two to 16 columns.

SQL Server Indexes

SQL Server implements B+ trees to build its indexes. You use the create index statement to create the indexes you need.

Create Index Syntax

CREATE [UNIQUE] [CLUSTERED | NONCLUSTERED] INDEX index_name
     ON [[database.]owner.]table_name (column_name [, column_name]...)
[WITH [PAD_INDEX, ]
[FILLFACTOR = x]
[[,] IGNORE_DUP_KEY]
[[,] {SORTED_DATA | SORTED_DATA_REORG}]
[[,] {IGNORE_DUP_ROW | ALLOW_DUP_ROW}]]
[ON segment_name]

For example, the following lines of code create an index on the au_id column of the authors table. The index would be a clustering index, and would enforce uniqueness.

Use pubs
Create unique clustered index myauind on authors (au_id)

The following creates a non-unique, nonclustered index on the au_fname column of the same table.

Use pubs
Create index mynamindex on authors (au_fname)

The Fillfactor and Pad_Index Options

The fillfactor option specifies how full each page in the leaf level of an index should be. The default fillfactor is 0. Fillfactor is a configuration parameter, so ensure check that it has not been changed. Remember that you can check this from Enterprise Manager, or you can run sp_configure without any parameters. Your output looks something like this:

Exec sp_configure
go
name                  minimum    maximum     config_value    run_value
----                  --------   --------    ------------    ----------
allow updates         0              1        0              0
backup buffer size    1             32        1              1
backup threads        0             32        5              5
database size         2          10000        2              2
default language      0           9999        0              0
fill factor           0            100        0              0
...

Look for the name fill factor in the output, and verify that the config_value and run_value columns both reflect a fillfactor setting of 0.

If you do not specify the fillfactor setting during the create index statement, the default value (typically 0) is used. A value of 0 means that the leaf pages of your index are full, but that non-leaf pages (intermediate pages and the root page) still have room for at least one more row (two for non-unique clustered indexes). If the fillfactor is 100, it means all the leaf pages will be 100 percent full, with no room for additional rows. Any other value is the percentage of each leaf page to fill with rows. SQL Server rounds the percentage to the nearest row size, so you rarely get exactly the percentage you ask for, but it is as close as SQL Server can get it.

If you created a clustered index with a fillfactor of 50, each page would be 50 percent full. In this figure, you can see that the leaf page of the clustered index is only half-full. The code might look like this:

CREATE INDEX aunameindex on authors (au_fname)
WITH FILLFACTOR = 50
GO

The index would look similar to Figure 13.4.

Figure 13.4. A clustered index with fillfactor=50.


NOTE: Fillfactor is not maintained on the index. If you create an index with a fillfactor of 50 (meaning each page is half full), over time it is likely that some pages will fill up, and others may get close to empty. You will have to do one of several things if you would like to see your fillfactor reestablished.

Each option reinstates your fillfactor.


The PAD_INDEX option, when used with a fillfactor, specifies that the non-leaf pages of the index have the fillfactor applied to them as well as the leaf pages.

The easiest way to understand the PAD_INDEX option is to look at an example.

If you create the same index as shown in Figure 13.4, but add the PAD_INDEX option, the non-leaf pages also have the fillfactor applied. If your rows don't fit perfectly, SQL Server gets as close as possible to the fillfactor you've requested. For example, the following might create an index like this Figure 13.5.

CREATE INDEX aunameindex on authors (au_fname)
WITH FILLFACTOR = 50, PAD_INDEX
GO

Figure 13.5. A clustered index with pad_index.

All of this has explained how fillfactor and pad_index work--not why you would want to use them. There can be many times when it can be quite useful to specify a fillfactor. To understand why that's the case, you need to know a couple of terms. The first is a page split. A page split occurs when a SQL Server page is full and another row needs to be put on that page. A new page is assigned to the index or table, and 50 percent of the rows are moved to the new page. The new row is added in the appropriate location. As you might imagine, this can be an expensive operation if it happens frequently. If you use a fillfactor when you create your index, you can allow new rows to be added without causing page splits. Setting the appropriate fillfactor will likely improve performance. However, getting the proper fillfactor comes from experience.

This can be particularly useful with clustered indexes, because the the actual data pages are the index's leaf level. Be aware, however, that applying a fillfactor such as 50 to the clustered index approximately doubles the size of the table. This can have a very dramatic impact on space usage, and should be done with some caution. Because of the way non-clustered indexes are stored, page splitting on the clustered index can be particularly expensive in terms of its impact. Each row in the non-clustered index has a pointer to the data row's page number/row number combination. If a page split on the clustered index occurs, approximately half of the rows move. That means half the rows have to modify each entry for each non-clustered index. All of these modifications occur while you wait for your data insert or update to occur, resulting in slower response times.

When you add pad_index, you can avoid non-leaf page splits as well. This increases space usage, but not nearly as much as the fillfactor.

Sorted Data Options

The SORTED_DATA and SORTED_DATA_REORG options only apply to clustered indexes.

SORTED_DATA

If you need to drop and recreate your clustered index and the data is still correctly sorted, you can avoid having SQL Server perform the sort operation that is part of a clustered index creation. The data may still be correctly sorted if you are either sure data you've added is in the proper order, or you haven't changed the data since dropping the index. The lack of a sort step can significantly speed your index creation. When the clustered index is created, SQL Server verifies that the data is in the proper order. The create index statement fails if the data is not in the proper sequence. You could use this option if you're trying to optimize data loads by dropping all the indexes on your table, loading the data, and then recreating the indexes. SQL Server doesn't make a second copy of the data, and therefore, it doesn't take much space to recreate the index when using this option.

SORTED_DATA_REORG

This option makes a copy of the data pages, just as a normal creation of a clustered index would. It physically "cleans" the data, reapplying a fillfactor, and cleaning partially used pages. It still avoids the sort operation, with the same check to verify that the data is in the proper sequence that occurs with the SORTED_DATA option. You use this option when recreating a clustered index, but have the intention of reorganizing your data pages. If you have pages that are partially used, or generally could benefit from being compacted, then you may want to use this option. The SORTED_DATA_REORG option still requires the same amount of free space as a normal create index statement: 120 percent or more of the size of the table.

The IGNORE_DUP_KEY, IGNORE_DUP_ROW, and ALLOW_DUP_ROW options are advanced options and are rarely used. There's no need to explore them here.

Index Usage

You've examined index basics and how to create SQL Server indexes. However, you haven't seen which indexes to create or when to use them.

When Will My Index Be Used?

You can start with the most obvious question: How do I know when an index will be used? When a query is submitted to SQL Server, the query is broken apart and analyzed. In terms of optimization for indexing, the most important part of the query is the WHERE clause. A statement in the WHERE clause of a query is the most likely way that SQL Server will know how to optimize your query to use an index. However, just because your query contains a WHERE clause doesn't mean that an index will be used. Take this query, for example:

SELECT au_id, au_fname, au_lname
FROM pubs..authors
WHERE state = `CA'
GO

If there was an index on the state column, but most of the authors came from California, then there's no point to using an index. A table scan (reading every data page in the table) will most likely be the most efficient plan to get the data. When the phrase "the most efficient plan" is used, it means efficient in terms of minimizing the number of pages that need to be read by SQL Server. The other term for this is minimizing I/O (input/output) and logical page reads.

Given a similar query that returns only one row, it will most likely make sense to use an index. For instance, if there were a clustered index on the au_id column, the following query would most likely use the index:

SELECT au_id, au_fname, au_lname
FROM pubs..authors
WHERE au_id = "341-22-1782"

How Do You Verify Index Usage?

The next question: How can you verify which index will be used by your query? Use the Transact-SQL statement SET SHOWPLAN ON to see SQL Server's index selection. To view how indexes will be used, create a table that's big enough to have indexes used. Run the following script to create a table and insert some data into that table:

CREATE TABLE PUBS..INDEXTAB
(col1 int not null,
 col2 char(250) not null,
 col3 char(250) not null,
 col4 char(250) not null,
 col5 char(250) not null)
insert indextab values (1,'adam','col3','col4','col5')
insert indextab values (2,'bob','col3','col4','col5')
insert indextab values (3,'charles','col3','col4','col5')
insert indextab values (4,'david','col3','col4','col5')
insert indextab values (5,'edward','col3','col4','col5')
insert indextab values (6,'frank','col3','col4','col5')
insert indextab values (7,'george','col3','col4','col5')
insert indextab values (8,'hank','col3','col4','col5')
insert indextab values (9,'ida','col3','col4','col5')
insert indextab values (10,'john','col3','col4','col5')
insert indextab values (11,'kim','col3','col4','col5')
insert indextab values (12,'loni','col3','col4','col5')
insert indextab values (13,'mike','col3','col4','col5')
insert indextab values (14,'nikki','col3','col4','col5')
insert indextab values (15,'oprah','col3','col4','col5')
insert indextab values (16,'paul','col3','col4','col5')
insert indextab values (17,'quan','col3','col4','col5')
insert indextab values (18,'richard','col3','col4','col5')
insert indextab values (19,'sam','col3','col4','col5')
insert indextab values (20,'tom','col3','col4','col5')
insert indextab values (21,'uma','col3','col4','col5')
insert indextab values (22,'vera','col3','col4','col5')
insert indextab values (23,'walter','col3','col4','col5')
insert indextab values (24,'xray','col3','col4','col5')
insert indextab values (25,'yuma','col3','col4','col5')
insert indextab values (26,'zane','col3','col4','col5')
insert indextab values (27,'ann','col3','col4','col5')
insert indextab values (28,'bill','col3','col4','col5')
insert indextab values (29,'cathy','col3','col4','col5')
insert indextab values (30,'dawn','col3','col4','col5')
insert indextab values (31,'ellen','col3','col4','col5')
insert indextab values (32,'fran','col3','col4','col5')
insert indextab values (33,'grant','col3','col4','col5')
insert indextab values (34,'helen','col3','col4','col5')
insert indextab values (35,'irwin','col3','col4','col5')
insert indextab values (36,'jack','col3','col4','col5')
insert indextab values (37,'kathy','col3','col4','col5')
insert indextab values (38,'lance','col3','col4','col5')
insert indextab values (39,'molly','col3','col4','col5')
insert indextab values (40,'nancy','col3','col4','col5')
CREATE INDEX CL_MYINDEX on indextab (col1)
SET SHOWPLAN ON
GO
select col1, col2 from indextab
where col2 = `ann'GO

The output generated by these statements would look similar to this:

STEP 1
The type of query is SELECT
FROM TABLE
indextab 
Nested iteration
Table Scan
col1        col2

---------- ------------------------ 
27          ann

(1 row(s) affected)
...

As you can see from the output of the showplan, SQL Server chose to use a table scan to run this query. You see the results of your query after the results of the showplan are put out. This leads you to the first adjustment you make. SQL Server doesn't need to actually run the query to tell you how it will use available indexes. You can use the statement SET NOEXEC ON to cause SQL Server's display of the results from the showplan option, without actually processing the query and returning the data to you. In the next set, you can see that you get the results of the showplan, and do not get the results from the query. Be careful that you have SET NOEXEC ON as the last statement before the GO command. If you have it before the SET SHOWPLAN ON statement, the SET SHOWPLAN statement won't be run.

SET SHOWPLAN ON
SET NOEXEC ON
GO
select col1,col2 
from indextab
where col1 < 3

And the output from these statements:

STEP 1
The type of query is SETON
STEP 1
The type of query is SETON
STEP 1
The type of query is SELECT
FROM TABLE
indextab 
Nested iteration
Using Clustered Index

The data was not returned as part of the query, and SQL Server stated that the clustered index is used. If a non-clustered index is used, it tells you the name of the non-clustered index. Because there's only one clustered index at a time on a table, there's no need to tell you the name. Remember that you can always see the indexes that are on a table with the sp_helpindex stored procedure.

SET NOEXEC OFF
SET SHOWPLAN OFF
GO
EXEC sp_helpindex indextab
GO
index_name           index_description               index_keys 
-----------          ------------------              ----------- 
CL_MYINDEX           clustered located on default       col1 
 (1 row(s) affected)

All you've seen so far is what choice SQL Server made. The other part of the puzzle that you fill in is how SQL Server makes it decision. For the most part, the optimization is done based on how much work it takes to answer your question. SQL Server wants to answer your question as quickly as it can, and by doing the least amount of work. That usually translates into minimizing the amount of logical I/O that is done.

Logical I/O is data access by SQL Server. Each logical I/O means that SQL Server has read a single 2K page. The total number of pages accessed will always equal the number of logical I/Os. SQL Server differentiates between logical and physical I/O. If a query takes 50 logical I/Os and only 20 physical I/Os, it means that 30 of the 50 pages it needed were already in memory. The other 20 pages had to be retrieved physically from the disks and brought into memory so that they could be read--hence the term physical I/O. SQL Server attempts to minimize the amount of physical I/O access. You can see the amount of I/O a query will use by using the statement SET STATISTICS IO ON. Make sure you turn off the NOEXEC option before you use the statistics IO command, because you must actually run a query to get statistics I/O information.

SET STATISTICS IO ON
SET SHOWPLAN ON
GO
SELECT col1,col2 
FROM indextab
where col1 < 3
GO
STEP 1
The type of query is SELECT
FROM TABLE
indextab 
Nested iteration
Using Clustered Index
col1        col2

----------  ------------- 
1           adam

2           bob

(2 row(s) affected)
Table: INDEXTAB  scan count 1,  logical reads: 3,  
physical reads: 3,  read ahead reads: 0

It took two logical reads to answer this query: One read to get the root page of the clustered index, and one to read the leaf page of the index. Remember that because it's a clustered index, the leaf page is the data page. SQL Server normally favors using a clustered index.

In this figure you can see how you can discern these two logical reads. SQL Server must read the index's root page, and then the index's leaf level to return the data. This index has no intermediate pages (it's too small).

Now that you've seen how SQL Server wants to access your tables, what can you do about it? In addition, what do you do if you've created indexes and you're sure that they should be used, yet SQL Server won't use them? You can override the optimizer.

Figure 13.6. Your clustered index.

Overriding the Optimizer

You can specify what's known as optimizer hints. The first thing to consider is that most of the time the SQL Server optimizer is correct. If you're still sure that you are right and that SQL Server is wrong, you've updated your statistics, run DBCC UPDATEUSAGE, and checked the SHOWPLAN output, then you can use optimizer hints. Each of these options is examined later in this chapter.

To select an index, you specify either the index name or the index number as part of your select statement. You can find an index's number by querying the sysindexes system table in your database.

SELECT col1, col2
FROM indextab (index = CL_MYINDEX)

This query normally uses a table scan, but in this case you told it to use the clustered index that you created earlier.


WARNING: It always uses the index you have selected, regardless of what SQL Server thinks is a proper choice. SQL Server will never reevaluate your selection, so remember to retest your choice periodically. This is especially true when you apply service packs or upgrade SQL Server. Therefore, it's rarely necessary--and sometimes downright dangerous--to use optimizer hints.

How would you force SQL Server to use a table scan? It involves understanding a little about how indexes are numbered in the sysindexes table. Non-clustered indexes are numbered between 2 and 250. If a clustered index exists, it has an index number of 1. If no clustered index exists, a row with index number 0 is in the sysindexes table. It's possible to specify the index number in the optimizer hint rather than using the index name. Hence, the previous SQL statement might also look like this (assuming that the index stateindex had an index number of 3):

SELECT col1, col2
FROM indextab (INDEX = 1)

You write your query like the following to force a table scan. You can see that SQL Server does indeed do exactly as you've asked and use a table scan.

SET SHOWPLAN ON
SET NOEXEC ON
GO
SELECT col1, col2
FROM indextab (INDEX = 0)
WHERE col1 < 3
STEP 1
The type of query is SELECT
FROM TABLE
indextab
Nested iteration
Table Scan

Index Maintenance

Indexes need to be maintained to remain efficient and useful over time. You must manually maintain these indexes (or set up automated plans to do so). Either way, you need an understanding of the basics of index maintenance.

Update Statistics

SQL Server knows about your data through a special page known as the distribution page. The distribution page contains information about how data is distributed throughout the table, and calculates the average number of duplicates for the indexed columns. When you create an index, if there is already data in the table, a distribution page is created and populated with information about the columns that you have indexed. There is not a distribution page if you create an index on a table and then add the data. Without this page, SQL Server must guess about how your data is distributed. When your query contains a clause, such as state = `CA', SQL Server consults the distribution page to guess how many data rows will be returned. The more current your distribution page, the better job SQL Server will do selecting the proper index (or deciding to use a table scan).

SQL Server does not automatically maintain this distribution page, and in the case of an index created on an empty table, it won't create a distribution page. Run the UPDATE STATISTICS command to update the distribution page with current information.

UPDATE STATISTICS [[database.]owner.]table_name [index_name]

Where database, owner, table_name, and index_name all have the same meanings you've used before.

It's a good idea to run the UPDATE STATISTICS command when your data changes, so that SQL Server knows how your data is physically distributed. As a practical matter, unless your data is read-only, you should update statistics as frequently as is practical. Once a week on each table in the database is a good idea.

DBCC UPDATEUSAGE

DBCC UPDATEUSAGE corrects information in the SQL Server system table sysindexes. The information in that table tells SQL Server things like how many pages are in the table and how many pages are used by each index. That information, just like the distribution page, is not maintained automatically. Run DBCC UPDATEUSAGE to update the information.

DBCC UPDATEUSAGE ({0 | database_name} [, table_name [, index_id]])

If you specify a database name, the information in sysindexes will be updated for all tables and indexes. You can be as specific or as generic as you'd like. If you specify just a table name, data for all indexes on that table is updated. If you want to update a single index, you must specify the index number, which you can find by looking in the sysindexes table.

Running DBCC UPDATEUSAGE against a database does not only help the optimizer, it helps with the output of sp_spaceused. The sp_spaceused stored procedure shows you how much space is being taken by objects in the database. It does so by reading the information in the sysindexes table. If the information in sysindexes is incorrect, the sp_spaceused results are wrong. This also applies to space and sizing information in SQL Enterprise Manager. Therefore, it's a good idea to run DBCC UPDATEUSAGE regularly. It's a good idea to run DBCC UPDATEUSAGE at and UPDATE STATISTICS at the same time. When these commands have been run regularly, SQL Server has the most accurate information available to make its optimization decisions.

DBCC SHOWCONTIG

Your indexes need to be periodically reorganized. What this means is that over time, indexes do not maintain their fillfactors or become fragmented, and therefore become less useful. You can use the DBCC SHOWCONTIG command to discover whether your indexes are fragmented and need to be reorganized.

DBCC SHOWCONTIG (table_id, [index_id])

table_id is the internal table number found by either selecting the value from the sysobjects table, or by getting the value with the object_id () function. The index_id is the value of the index number found in sysindexes.

Here's a script you can run to look at the results of DBCC SHOWCONTIG. Don't worry too much about the syntax, it will all make sense by Day 21! Make sure that you run this in a new query window (or turn off showplan and any other options you might have enabled before).

USE PUBS
GO
DECLARE @tableid int
SELECT @tableid = object_id (`INDEXTAB')
DBCC SHOWCONTIG (@tableid)
GO
(1 row(s) affected)
DBCC SHOWCONTIG scanning `INDEXTAB' table...
(1 row(s) affected)
[SHOW_CONTIG - SCAN ANALYSIS]                                               
-------------------------------------------------------------------------- 
Table: `INDEXTAB' (1472008275)  Indid: 1  dbid:4
TABLE level scan performed.
- Pages Scanned................................: 40
- Extent Switches..............................: 4
- Avg. Pages per Extent........................: 8.0
- Scan Density [Best Count:Actual Count].......: 100.00% [5:5]
- Avg. Bytes free per page.....................: 1006.0
- Avg. Page density (full).....................: 50.05%
- Overflow Pages...............................: 0
- Disconnected Overflow Pages..................: 0
DBCC execution completed. If DBCC printed error messages, 
see your System Administrator.

There's a significant amount of information here, but you need to focus on just two or three key items. You can see on the first line, where Indid: 1 indicates that there is a clustered index on the table. The next value to examine is the Scan Density. The higher the percentage, the better shape your table is in. The other number to look at is the Avg. Page density (full). You want this number to be as close to your fillfactor as possible.

When these numbers start to get somewhat low (there's no concrete number to use), it's probably a good time to reorganize your indexes. You could drop and then recreate your index. If it's a clustered index, you can use the SORTED_DATA_REORG option to speed that operation. This puts the index back into the original shape, including reapplying your fillfactor options. If you drop and recreate the clustered index, you reorganize the data pages as well as the index structure.

DBCC DBREINDEX

There's another option that might be a better choice: the DBCC DBREINDEX command.

DBCC DBREINDEX ([`database.owner.table_name' [, index_name
[, fillfactor [, {SORTED_DATA | SORTED_DATA_REORG}]]]])
[WITH NOINFOMSGS]

Many of the options remain exactly where they are in the CREATE INDEX statement, so you should refer to that statement for a review of the sorted data and fillfactor options. However, the original fillfactor that was specified when the index was created is used if you specify a fillfactor of 0. Otherwise, it is set to the fillfactor value you specify. However, if you want to specify an optional parameter, you must specify all of the previous options. For instance, to specify a fillfactor, you must specify an index name, even if you provide a blank name.

DBCC DBREINDEX (`authors','', 0) to rebuild the indexes on the authors table (all of them) with the original fillfactor.

If you specify a table, such as the following, every index on the table will be dropped and recreated, as shown in this output:

DBCC DBREINDEX (`indextab')
GO
Clustered index `CL_MYINDEX' is being rebuilt.
DBCC execution completed. If DBCC printed error messages, 
see your System Administrator.
Otherwise, you can specify an individual index. This is the creation of a non-clustered index on your table:CREATE INDEX noncl_col2_indextab on indextab (col2)
GO
DBCC DBREINDEX (`indextab','noncl_col2_indextab')
Non-clustered index `noncl_col2_indextab' is being rebuilt.
DBCC execution completed. If DBCC printed error messages, 
see your System Administrator.


NOTE: Be careful when specifying a fillfactor with DBCC DBREINDEX. If you don't specify a particular index, but use the fillfactor setting, such as the following, it resets the fillfactor on all indexes on the table (in this case to 50). That may or may not be what you've intended.
	   DBCC DBREINDEX (`indextab', `', 50)


DBCC DBREINDEX has one advantage over dropping and recreating indexes. Tomorrow you look at referential integrity and ANSI constraints. Some of these constraints create indexes behind the scenes. DBCC DBREINDEX can safely drop and recreate the indexes created with constraints, whereas the only other way to do this is to drop and recreate the constraints. That can be extremely difficult.

Enterprise Manager

SQL Enterprise Manager has excellent options available from the Manage Indexes dialog box. Highlight the database you want to work with (it's the pubs database here), and then select Manage  Indexes from the menu to be presented with this dialog box (Figure 13.7).

Figure 13.7. The Manage Indexes dialog box.

You can do just about every option available that you've seen so far today from this dialog box. If you'd like to create, remove, or rename an index, it can be done here. There are two other options available here. If you click Rebuild, it causes a DBCC DBREINDEX to be run for this index. If you click Check Size, it issues a DBCC UPDATEUSAGE. If you click the Distribution button, you can view the Statistics Distribution page, which might look something like Figure 13.8.

Figure 13.8. Distribution dialog box.

It shows you the page and tells you when statistics were last updated. You can also do that from Transact-SQL with the STATS_DATE function.

SELECT STATS_DATE(table_id, index_id)

The table_id and index_id are the objects' internal numbers, so it's a little difficult to run without some advanced Transact-SQL. For instance, to find the date the statistics were last updated on your indextab table, you could run this script:

Use pubs
Go
Declare @tabid int
Declare @indid int
Select @tabid = object_id (`indextab')
Select @indid = 1 /* the clustered index */
SELECT STATS_DATE (@tabid, @indid)

You get output similar to this

(1 row(s) affected)
(1 row(s) affected)
                            
-------------------------- 
Dec 2 1997 12:46AM          
(1 row(s) affected)

Index Selection

Now that you've seen the options available to you and how to build indexes, your next logical question is this: Which columns should I index, and how should I index them?

Good candidates for indexes include:

In general, you should index columns that are included in your WHERE clause using conditions that don't include functions or calculations. In other words, the following code probably indicates that a good candidate to index is the emp_id column:

Where emp_id = 5
This code probably won't be able to use an index, so there's no point of creating an index (for this query, anyway) on the emp_id column: where emp_id/5 = 210

Statements such as the first WHERE clause are known as searchable arguments. Searchable arguments (also called SARGs) are the types of arguments that can be used by SQL Server. Besides exact matches, you can sometimes use approximate matches on character columns. For example, the following is searchable

where emp_lname like `w%'

The code is not.

where emp_lname like `%w%' 

Image trying to do a search like this yourself in the telephone book. Could you easily find the second condition? SQL Server has the same problem you would--as long as the first letter is specified, you have somewhere to start.

One other point to consider is that not every column is a good candidate to index. You should not index any of the following:

The last option depends on your application and table, but it's a good bet that a char(255) is an unlikely candidate for an index.

You must be careful, however, because each index introduces maintenance, space usage, and performance issues. It's generally recommended that you avoid having more than 3-4 indexes on a table. There are quite a few exceptions for every rule, but this is a good guideline to follow. After deciding which columns to index, you must determine which kind of index to use: clustered or non-clustered.

Clustered Index Choices

You only get one clustered index, so it's a good idea to choose this one first. Good candidates for clustered indexes include:

Another choice might be to index Primary Key or Foreign Key columns, which you examine tomorrow.

Non-Clustered Index Choices

After you've selected your clustered index, the rest of the indexes you've decided to create must be non-clustered. Good candidates include:

Example: If you had an index on the au_lname and au_fname columns of the authors table in the pubs database, the following query could be answered entirely from the index without accessing the data:  SELECT au_fname
from pubs..authors where au_lname = "White"

The key issue is that you only index columns that are used in the WHERE clause of your queries, and that you verify that the indexes you choose are used by SQL Server using the Showplan option.

Summary

Indexes are not required, but are extremely useful for speeding your queries. They are essentially the "Go Fast" option in SQL Server. There are a variety of options available during the creation of your indexes, and there are several maintenance issues that need to be addressed. Your indexes help you resolve queries faster, and provide better overall performance as long as you maintain them properly.

Remember that you will probably not need to override the optimizer, but that option is available to you in the form of optimizer hints. It requires an understanding of the query, the data being queried, and the available indexes to make a good choice for optimization. If you are not comfortable with your knowledge level regarding these things, you should let SQL Server pick the indexes for you--it's right almost every single time if you've performed your index maintenance properly.

Q&A

Q What are some of the differences between clustered and non-clustered indexes?

A
The number of indexes available (1 for clustered, 249 for non-clustered), and the contents of the leaf level of the index (data for the clustered index, rows pointing to the data for the non-clustered index).

Q How does SQL Server determine when to use an index?


A
It analyzes the WHERE clause in your query, as well as the statistics from the distribution page and the information stored in the sysindexes table.

Q When would I specify a fillfactor setting on an index?


A
When you want to leave free space in your indexes to allow for growth of the data, and to reduce page splits.

Q How do I override SQL Server's choice of indexes?


A
With optimizer hints such as FROM indextab (index = 1).

Workshop

The Workshop provides quiz questions to help you solidify your understanding of the material covered, as well as exercises to provide you with experience in using what you've learned. Try to understand the quiz and exercise answer before continuing on to the next day's lesson. Answers are provided in Appendix B.

Quiz

1. How do I force SQL Server to choose a table scan?

2. How much free space do I need to create a clustered index?

3. What option do I turn on to verify the index selection of SQL Server?

Exercises

Write a query to select data from the sales table, where title_id = BU1032, and show SQL Server's index choice and how much I/O it will take.

Override the choice and use the clustered index.

Create a new table, along with a clustered index and two non-clustered indexes, each with a different fillfactor. Then use DBCC DBREINDEX to reorganize the indexes and reset the fillfactor to 50.

Do the same thing from step 3 of the quiz, but accomplish the DBCC DBREINDEX and fillfactor changes from within SQL Enterprise Manager.


Previous chapterNext chapterContents


Macmillan Computer Publishing USA

© Copyright, Macmillan Computer Publishing. All rights reserved.