Chapter 17
Troubleshooting SQL Server



So far, each chapter in this book has covered common problems and resolutions. This chapter steps back and focuses on how SQL Server alerts you to possible problems with databases, objects, or the server; how to find more information about the problem; how to fix the problem; and how to get help in determining and fixing your problem. You also learn about several tools that ship with SQL Server to help you track and debug problems, as well as other resources readily available to aid in problem resolution. Start by taking a look at SQL Server error messages.

SQL Error Messages

If you run a query and accidentally make a mistake by entering a table that does not exist in the database, what happens? SQL Server returns an error message. Actually, SQL Server reacts to all errors in the same manner, whether those errors are generated by users, databases, objects, or the system. SQL Server returns a formatted error message and/or writes the error message to the error log and/or event log. Here is a quick example that executes a SQL statement to update a nonexistent table in the pubs database. The SQL statement for the example is as follows:

UPDATE new_authors
Set author1 = "Spenik",
author2 = "Sledge",
title="Microsoft SQL Server DBA Survival Guide"

When the statement is executed, the following error message is returned:

Msg 208, Level 16, State 1
Invalid object name `new_authors'.

The preceding error message demonstrates the standard message format for error messages returned by SQL Server.


TIP: The first thing presented in the error message is the message number, severity level, and the state. To most users, these numbers are just garbage to be ignored, so they skip down to the message and try to resolve the problem. In reality, the error message number is very useful for obtaining more error information. You can use the severity levels to help find errors that need to be handled. When tracking a problem, always write down all the error information, including the message number, severity level, and state. In many cases, these will be of more assistance than the actual message.

Examine the format of a standard SQL Server error message.

Error Message Number

Each error message displayed by SQL Server has an associated error message number that uniquely identifies the type of error.


TIP: You can define your own error messages. User-defined error message numbers must be greater than 50,000 and less than 2,147,483,647. You can use the system stored procedure sp_addmessage (described in detail in Appendix C) to add the error message to the system table, sysmessages. From a trigger or stored procedure, you can use the RAISERROR statement to report a user-defined error message to the client and SQL Server.

Error Severity

The error severity levels provide a quick reference for you about the nature of the error. The severity levels range from 0 to 25.

Severity Level Meaning
0 to 10 Messages with a severity level of 0 to 10 are informational messages.
11 to 16 Severity levels 11 to 16 are generated as a result of user problems and can be fixed by the user. For example, the error message returned in the invalid update query used earlier had a severity level of 16.
17 Severity level 17 indicates that SQL Server has run out of a configurable resource, such as user connections or locks. Severity error 17 can be corrected by the DBA, and in some cases, by the database owner.
18 Severity level 18 messages indicate nonfatal internal software problems.
19 Severity level 19 indicates that a nonconfigurable resource limit has been exceeded.
20 Severity level 20 indicates a problem with a statement issued by the current process.
21 Severity level 21 indicates that SQL Server has encountered a problem that affects all the processes in a database.
22 Severity level 22 means that a table or index has been damaged. To try to determine the extent of the problem, stop and restart SQL Server. If the problem is only in the cache and not on the disk, the restart will correct the problem. Otherwise, use DBCC to determine the extent of the damage and the required action to take.
23 Severity level 23 indicates a suspect database. To determine the extent of the damage and the proper action to take, use the DBCC commands.
24 Severity level 24 indicates a hardware problem.
25 Severity level 25 indicates some type of system error.


NOTE: Severity errors 19 through 25 are fatal errors. When a fatal error occurs, the running process that generated the error is terminated (nonfatal errors continue processing). For error severity levels 20 and greater, the client connection to SQL Server is terminated.

State Number

The error state number is an integer value between 1 and 127; it represents information about the invocation state of an error.

Error Message

The error message is a description of the error that occurred. The error messages are stored in the sysmessages system table. Figure 17.1 shows a query result of the sysmessages table.


NOTE: The Query Analyzer is the new SQL Server 6.5 name for the SQL Server 6.0 Query window.

Figure 17.1.
Query results of
sysmessage using the SQL Query tool.

Using the Error Message Number to Resolve the Error

Earlier in this chapter, you learned that, by using the error message number, you could quickly retrieve detailed information about the error and possible ways to resolve the error. How, you may ask? Books Online!


Information at Your Finger Tips
Isn't technology great! I believe that to really appreciate Microsoft's Books Online, you have to have been a Sybase DBA from the 4.2 UNIX days. When an error would occur that displayed the error number, you jotted down the error number and then tried to locate the error messages and the troubleshooting guide. Of course, the book was never in the same place. And if you had my luck, once you found the book, the error number was never in the book--it always fell within the "reserved" section or something similar. New DBAs who start with SQL Server 6.x will truly be spoiled by Microsoft's Books Online.

Hopefully, when you installed SQL Server, you included the Books Online utility shown in Figure 17.2.

Figure 17.2.
The SQL Server Books Online.


To see how to use Books Online to find more information on the error message number displayed during the invalid query example (error message number 208), follow these steps:

  1. Select Tools, Query from the menu in the SQL Server Books Online window. The Query dialog box appears (see Figure 17.3).


    Figure 17.3.
    The Query dialog box.



    NOTE: In Books Online in SQL Server 6.0, the Query dialog box is found by selecting Tools, Find. In version 6.0, the dialog box is called the Find dialog box instead of the Query dialog box.

  2. The Query dialog box allows you to quickly search Books Online for specific information. In the Query combo box, type the error message number: 208. In the Scope of Search frame, select the Entire Contents radio button. In the Topic Area To Search frame, select the Title Only radio button.

  3. To run the search, click the Run Query button. The query runs, searching for 208 in the title of any of the book topics. If one or more items are found, they are displayed in a Query Results dialog box (see Figure 17.4).

    Figure 17.4.
    The Query Results dialog box.

  4. To view the document(s) found in the search, double-click the item or select the item and click the Display button. The detailed information for the error message number, including a detailed explanation and the action to take, is displayed. You can even print the document! Just think, no more trying to locate a troubleshooting or error message book! No more flipping through pages searching for error messages; if the error number is not in the book, you know immediately! The detailed information found for error number 208 is displayed in Figure 17.5.

Figure 17.5.
The Books Online description of error message 208.

Deciphering the Error Log

The error log is a standard text file that holds SQL Server information and error messages and can provide meaningful information to help you track down problems or to alert you to potential or existing problems. SQL Server maintains the current error log and the six previous error log files. The current error log filename is ERRORLOG; the previous error log files, referred to as archived error logs, are named ERRORLOG.1 (most recent) to ERRORLOG.6 (the oldest). The default location of the error log file is in the \LOG directory off the SQL Server home directory. The following is an example of a SQL Server error log:

96/03/04 22:23:51.43 spid1 server name is `SPENIK'
96/03/04 22:23:51.47 spid1 Recovering database `model'
96/03/04 22:23:51.48 spid1 Recovery dbid 3 ckpt (394,4) oldest tran=(394,0)
96/03/04 22:23:51.56 spid1 Clearing temp db
96/03/04 22:23:52.83 kernel Read Ahead Manager started.
96/03/04 22:23:52.85 kernel Using `SQLEVN60.DLL' version `6.00.000'.
96/03/04 22:23:52.91 kernel Using `OPENDS60.DLL' version `6.00.01.02'.
96/03/04 22:23:52.92 kernel Using `NTWDBLIB.DLL' version `6.50.163'.
96/03/04 22:23:52.96 ods Using `SSNMPN60.DLL' version `6.3.0.0' to listen on `\\.\pipe\sql\query'.
96/03/04 22:23:55.01 spid10 Recovering database `pubs'
96/03/04 22:23:55.04 spid11 Recovering database `msdb'
96/03/04 22:23:55.07 spid11 Recovery dbid 5 ckpt (1799,18) oldest tran=(1799,17)
96/03/04 22:23:55.08 spid10 Recovery dbid 4 ckpt (1090,28) oldest tran=(1090,27)
96/03/04 22:23:55.10 spid11 1 transactions rolled forward in dbid 5.
96/03/04 22:23:55.10 spid10 1 transactions rolled forward in dbid 4.
96/03/04 22:23:55.38 spid1 Recovery complete.
96/03/04 22:23:55.40 spid1 SQL Server's default sort order is:
96/03/04 22:23:55.40 spid1 `nocase' (ID = 52)

The error log output includes the time and date the message was logged, the source of the message, and the description of the error message. If an error occurs, the log contains the error message number and description.


TIP: Spend some time looking at and understanding the messages in the error log, especially the proper startup sequence messages. This knowledge can come in handy in times of trouble!

You can view the error log using any text editor, such as Notepad, or you can use the SQL Server Enterprise Manager. To use the Enterprise Manager, select Error Log from the Server menu. The Server Error Log dialog box appears (see Figure 17.6). To view archived error logs, use the combo box shown in Figure 17.6.

Figure 17.6.
The Server Error Log dialog box.

Using the Event Viewer

SQL Server also logs information and error messages to the Windows NT event log. The event log is used by Windows NT as a repository for the operating system and applications to log informational and error messages. The Event Viewer is located in the Windows NT Administrative Tools group. The advantage of using the Event Viewer over the error log is that errors are easy to spot because NT highlights all error messages with a red stop sign; it highlights information messages with a blue exclamation mark (see Figure 17.7).

Figure 17.7.
The Windows NT Event Viewer, showing the application event log.

To view the detailed error message description, severity level, and state, double-click the line item. An Event Detail dialog box appears. The Event Viewer also provides a search utility that enables you to search for specific types of events in the event log. For example, you can search for all the error messages in the event log.

Killing a Process

A SQL Server user process is a task or request made to SQL Server by a user. Occasionally, you may be required to halt (stop) a user process before it completes. Perhaps the user has incorrectly formatted a query or launched a massive transaction that will take hours to complete and that has blocked out other users from necessary table information. Whatever the case may be, you can bet that sooner or later someone will ask you to stop his or her process or someone else will complain about not getting any information back. The proper terminology for halting a process is called killing a process, which sounds much more severe than just halting or stopping the process. When you kill a process, you completely remove the process from SQL Server.


TIP: The number one reason to kill a process is interference with other users' p rocessing (that is, the rogue process prevents users from getting to the required information by "blocking" them out, as in exclusive table locks, for a lengthy transaction).

SQL Server assigns each task a unique identity number called a spid (system process ID). To view the currently running processes and their spids, issue the system stored procedure sp_who, which has the following format:

sp_who [login id | `spid']

In this syntax, login_id is the specific user login ID for which you want to report activity and spid is a specific process ID for which you want to report activity.

Issuing the sp_who command with no parameters displays a report on all the current processes on SQL Server, as in the following example:

spid status loginame hostname blk dbname cmd
------ ---------- ------------ ---------- ----- ---------- ---------------
1 sleeping sa 0 master MIRROR HANDLER
2 sleeping sa 0 master LAZY WRITER
3 sleeping sa 0 master CHECKPOINT SLEEP
4 runnable sa 0 master RA MANAGER
10 sleeping sa SPENIK 0 master AWAITING COMMAND
11 runnable sa SPENIK 0 master SELECT

To kill a process, use the KILL command, which has the following syntax:

KILL spid

In this syntax, spid is the system process ID of the process you want to terminate.

You can kill only one spid at a time and the statement cannot be reversed. Once you have issued the command, the process will be killed. To kill spid number 11 shown in the previous sample, you issue the following command:

kill 11


NOTE: In pre-system 10 versions of Sybase and pre-Windows NT versions of Microsoft SQL Server, the KILL command did not always work. If the spid was a sleeping process, the only way to kill the process was to shut down the server. The inability to kill a process with the KILL statement was a kind of joke among DBAs; the processes were nicknamed zombies. A zombie process was a serious problem when a process really did need to be shut down and the KILL command was ineffective. Microsoft corrected the problem in SQL Server for Windows NT 4.21.

You also can kill a process using the SQL Server Enterprise Manager by performing the following steps:

  1. After you select a server from the Enterprise Manager, select Current Activity from the Server menu. The Current Activity dialog box appears (see Figure 17.8).

    Figure 17.8.
    The Current Activity dialog box.

  2. All the current processes running on SQL Server as well as the spids of the processes are displayed in the Current Activity dialog box.


    NOTE: In SQL Server 6.0, the Current Activity dialog box is found under the Tools menu.

  3. To kill a process, select the process you want to terminate and then click the Kill Process toolbar button in the Current Activity dialog box. The selected process is terminated.

Viewing Detailed Process Activity

The day will come when your phone is ringing off the hook because suddenly the system is slow or a user has been waiting a very long time for a report to complete. With SQL Server 6.x, you can easily view the current activity of the system using the Current Activity dialog box (refer back to Figure 17.8). Click the Detail Activity tab in the Current Activity dialog box; the Detail Activity page of the Current Activity dialog box appears (see Figure 17.9).

Figure 17.9.
The Detail Activity page of the Current Activity dialog box.


The Detail Activity page allows you to quickly view the current state of executing processes and helps you determine possible problems such as a blocked process (covered in detail in Chapter 23, "Multi-User Considerations").

You can view the last command executed by a process or the resource usage of the process (CPU and disk usage) by double-clicking the process in the Current Activity dialog box (either the User Activity or Detail Activity page). The Process Information dialog box appears (see Figure 17.10). The Process Information dialog box not only shows you the last command executed by the process but also the CPU and disk usage.

Figure 17.10.
The Process Information dialog box.

Using DBCC and Trace Statements to Troubleshoot

DBCC stands for Database Consistency Checker. DBCC consists of a series of commands that perform many different functions on databases and database objects. You use DBCC commands to perform database and database object maintenance but you also use the output of DBCC commands to find errors--and in some cases, fix them. Trace flags can be used to provide additional information about SQL Server actions such as process deadlock information (trace flags 1204 and 1205) or the estimated and actual cost of a sort (trace flag 326).


NOTE: DBCC and trace flag settings are detailed in Appendix E. It is highly recommended that you read and re-read Appendix E. Understanding when and how to use DBCC can save you a lot of headaches.

For starters, when tracking errors or problems, you will want to examine the output of your daily DBCC maintenance commands for standard SQL Server error messages. If you find an error message in a DBCC output, treat the error message like any other SQL Server error message and use Books Online or technical support to resolve the error. When you call technical support or find the error number in Books Online, one or many of the resolution steps may be to execute DBCC command(s) to resolve the problem. Suppose that you are trying to create an index in sorted order on the authors table, column au_lanme, in the pubs database; the CREATE INDEX statement fails with the following error message:

Msg 1520, Level 18, State:1
Sort failed because dpages in the Sysindexes row for table `authors' in database
`pubs' had an incorrect value. Please run DBCC CHECKTABLE on this table to correct
the value, then re-run your command.

The description in the error message suggests that you execute the DBCC command with the option CHECKTABLE. If you search Books Online for error 1520, you see a more detailed description of why the error message occurred and the following Action section (taken from Administrator's Companion, Books Online):

Action
To correct the page count, use one of the following statements:
dbcc tablealloc(tablename)
dbcc newalloc(databasename)
dbcc checktable(tablename)
After running DBCC, you should be able to create the index.

Become familiar with DBCC commands. Use the correct DBCC options (if any) to fix a problem in your database when instructed to do so.


Follow the Clues
Last night, while working late to help a group fix a SQL Server problem, I was reminded of a very important tip when troubleshooting SQL Server problems: Follow the clues. Do not speculate first. The reason I was reminded of this tip is that, when I arrived on the scene, everyone involved was ready to blame the problem on the new release of SQL Server that they had upgraded to two weeks ago (the upgrade was from release 4.21 to release 6.0--and I performed the upgrade). When I looked over the error log and the general state of the system, I quickly ruled out the upgrade. The problem they were having was running a new stored procedure that pulled information in from a mainframe flat file and massaged and reformatted the data to be output to another flat file to feed the mainframe. Whenever the stored procedure ran, the transaction log filled up before the process could complete. They had expanded the log several times and still the stored procedure could not run to completion. One important fact to mention is that the procedure had been tested with 500 rows on versions 4.21 and 6.0 but now they were attempting to run 16,000 rows.

One problem I spotted and fixed immediately with DBCC commands was that the transaction log's size was invalid and would not respond correctly to the DUMP TRANSACTION WITH NO_LOG command. Once the transaction log problem was corrected, I set up a threshold to dump the log when it was 80 percent full. Still the procedure would not run. When I examined the stored procedure, I found nothing unusual. A transaction was started and then the 16,000 rows were copied to four tables. This organization had several stored procedures that performed the same operation and nothing looked unusual with the stored procedure. Again the cry of "maybe it's version 6.0--let's try 4.21" was echoed. Because there was no evidence to suggest that it was a version 6.0 problem, we tried the procedure again. This time we monitored the server from the version 4.21 Object Manager using sp_who, sp_lock, and DBCC SQLPERF(LOGSPACE) to track what was going on. It was not long before we noticed that the spid executing the stored procedure seemed to be stuck on an UPDATE statement. That seemed odd because the only UPDATE occurred outside of the transaction and had already occurred. Again moans about the upgrade or the corrupted transaction log rose across the room. Except for the developer, who quietly stated, "I bet there is an update trigger on the table I'm inserting into."

Well, she was right. Not only was there an update trigger but the SQL statement was incorrectly copying 16,000 rows into a table with an update trigger when only 39 rows should have been copied. The problem was quickly solved. The moral of this story is that troubleshooting can be difficult and often consists of more then one clue (such as an upgrade, a corrupted transaction log, and a new process).

Try not to jump to conclusions. Solve each problem, one at a time, and if something does not make sense, keep searching for the real reason the process fails so that you can fix it. It is very easy to go down the wrong road. Oh, by the way: beware of triggers on tables when troubleshooting a developer process. They are easy to forget and, in some cases, may be the unexpected root of the problem!

Table Fragmentation

I have heard a lot of noise on various SQL Server forums about table fragmentation. Table fragmentation occurs on tables that have a lot of insert/update/delete activity. As the table is modified over a period of time, the fullness of each data page begins to vary (that is, pages are not full). You can defragment the table by dropping and re-creating the clustered index on the table, which packs each page with the fill factor amount of data. Not only does this improve performance when you read the table, it also can increase the amount of available database space. Because rebuilding a clustered index on a very large table can take a fair amount of time, SQL Server 6.x provides the DBCC SHOWCONTIG command that enables you to determine how fragmented a table or index is (that is, whether or not you need to rebuild it). The SHOWCONTIG option has the following format:

DBCC SHOWCONTIG (table id, [index id])

In this syntax, table id and index id are the IDs of the object found in the sysobjects table of the database. For example, the following is the command line and output from a DBCC SHOWCONTIG command performed on the authors table in the pubs database:

DBCC SHOWCONTIG(16003088)
DBCC SHOWCONTIG scanning `authors' table...
[SHOW_CONTIG - SCAN ANALYSIS]
--------------------------------------------------------------------------
Table: `authors' (16003088) Indid: 1 dbid:4
TABLE level scan performed.
- Pages Scanned................................: 1
- Extent Switches..............................: 0
- Avg. Pages per Extent........................: 1.0
- Scan Density [Best Count:Actual Count].......: 100.00% [0:1]
- Avg. Bytes free per page.....................: 89.0
- Avg. Page density (full).....................: 95.58%
- Overflow Pages...............................: 0
- Disconnected Overflow Pages..................: 0
DBCC execution completed. If DBCC printed error messages, see your System Administrator.

TIP: For those of you wondering how to find the table ID using sysobjects, the query used for the authors table is shown here. Remember: To get a table ID for a specific table, you must be in the database when you query sysobjects or make a direct reference to the database such as database..sysobjects. The query (executed from the master database) is as follows:

select id from pubs..sysobjects where name = "authors"


Other Sources of Help and Information

In SQL Server 6.x, Microsoft has done a good job of providing useful and valuable information in Books Online. But as a DBA, it is important for you to know that there are many other good sources of information for Microsoft SQL Server (such as this book). But what happens when the problem is beyond the scope of published resources? The following sections discuss some of the options available to you.

Technical Support

When you run across a problem not covered in this book or one of a very critical and urgent nature, it's time to get in touch with your tech support company. If you have purchased SQL Server, you have also (hopefully) purchased a support agreement with Microsoft or with a Microsoft Solution Provider to help you in the event of an emergency. If not, Microsoft can still provide help (you are charged per incident). My experience with Microsoft tech support has been very good. In general, Microsoft's support contracts and agreements are much less expensive than those of some of the other RDBMS companies. A Microsoft Solution Provider is an independent organization that provides consulting and integration services for Microsoft products and also can provide support. Before calling tech support, be sure to have all the information required to start an incident report. You should have the following information:


TIP: If the problem is one that can be reproduced using SQL commands or a sequence of events, have this information written down so that the tech support person can duplicate the results.

Microsoft TechNet

Before there was SQL Server 6.x Books Online, there was Microsoft TechNet. Microsoft TechNet is a monthly CD subscription that provides a wealth of information about Microsoft products. TechNet provides product white papers, release notes, current patches and drivers, and a knowledge base of product information and problem resolution. TechNet is fairly inexpensive for a yearly subscription of 12 monthly CDs packed with information. (I won't quote a price because it's subject to change!) To find out more about TechNet, call 1-800-344-2121.


TIP: TechNet is my second line of defense. If I can't resolve the problem based on my knowledge and Books Online, I check TechNet for a information on the problem. Do yourself a favor and subscribe!

The advantage of TechNet is that it is a monthly CD, so problem resolution not available when SQL Server 6.x shipped can be placed in the TechNet knowledge base for your immediate use. TechNet's search facility is similar to the Books Online search facility (or vice-versa, since TechNet was here first), as shown in Fig- ure 17.11. As you can with Books Online, you can perform searches on error numbers or keywords and get a list of articles that contain the keyword or error number.

Figure 17.11.
Microsoft TechNet.

Online Services

To obtain interactive support without using tech support, try one of the interactive Microsoft SQL Server forums. Currently, forums exist on CompuServe (GO MSSQL) and on the new Microsoft Network. Using the online services, you can search for existing messages that deal with problems you are experiencing. You can also post messages asking for help from your peers. I find the online services to be very useful and well worth the small monthly fee. Many individuals have their problems resolved on the online services, but most of these problems are of a noncritical nature because turnaround time for a posted question is an unknown.

User Groups

SQL Server user groups can provide a forum in which you can discuss problems or issues with your local peers. They also tend to enlighten you on current products and future releases.

Between the Lines

Following are some of the important points to review when you troubleshoot SQL Server:

Summary

You should now know where to search for SQL Server error messages; you should also understand the format and meaning of SQL Server error messages. This chapter provided the foundation for your understanding of how to interpret and research the error messages you may receive during routine maintenance. Another important point in this chapter is that you have to be a good DBA and get involved in your local SQL Server user group or spend a few hours a week in a online service, interacting with your peers and learning more about SQL Server. In Chapter 25, "Developing a SQL Server Maintenance Plan," and Chapter 26, "Automating Database Administration Tasks," you learn how to perform preventive maintenance on your SQL Server to limit the amount of time you spend troubleshooting.


DISCLAIMER


To order books from QUE, call us at 800-716-0044 or 317-361-5400.

For comments or technical support for our books and software, select
Talk to Us.

© 1997, QUE Corporation, an imprint of
Macmillan Publishing USA, a Simon and Schuster Company.