Page 758
Oracle optimizer can best serve your application needs. Do it interactively if necessary, but most often these ANALYZE runs can be scheduled or triggered by DML events and therefore can be written by using noninteractive scripts or stored procedures.
To wrap up the section on optimization, let's cover a few commonly used definitions that, as a DBA, you should understand. A histogram is a nothing more or less than a special type of bar chart. When you ANALYZE COMPUTE or ANALYZE ESTIMATE, you store the data distributions of the columns for that table. A data distribution is a function that maps frequencies (counts or subtotals) or percentages of distinct values. In Oracle, those data distributions are stored numerically but can be displayed as histograms, if you were to write a program to do so.
The selectivity, or sometimes the selectivity factor, of an index is how many total distinct values are in a column of data values, divided by the total (non-distinct) values. The higher the selectivity factor is, the better. In addition, the higher the selectivity, the better justification for creating and using an index on that column. Usually, the fewer distinct values, the less selectivity. (This makes sense because that number is the numerator of the selectivity formula.)
For example, normally a column that stores gender type, such as Male or Female, or categor ical types with a few distinct possibilities, such as marital status, with Single, Married, Divorced, or Widowed, are not selective and normally will not provide a good basis for an index usage. One exception is when one or more of the distinct values in these types of columns are extremely rare in occurrence and happen to be queried often (used frequently in WHERE clauses). Suppose, for example, that marital status is used as a column for a senior citizens table where the value Single is fairly rare, occurring less than 10 percent of the time. It also happens that the Single value is often queried. Hence, an index will do nicely for these queries, although it is of no help in querying on the other values. Further, a bitmapped index would work well here. The decision of when to use an index and what type of index to use is covered in the section "Identifying Typical Problems," later in this chapter.
An optimizer chooses to read and return data from an index when it is available, as opposed t o having to go fetch it from the table. This optimizer preference is known as index coverage and usually results in better performance, although occasionally it can result in poorer performance. Suppose, for example, you have a composite index on LAST_NAME, FIRST_NAME, and MIDDLE_INITIAL in an EMPLOYEE table. If you were to run the following query to find all the employees with the last name "SMITH":
SQL> SELECT LAST_NAME 2> FROM EMPLOYEES 3> WHERE LAST_NAME = `SMITH';
then the query is said to be covered by the index. In other words, all the columns requested in the SELECT clause of the SELECT statement are stored, not only in the table but in the index. For obvious reasons, the optimizer normally prefers to use the index alone over using both the index and the table. Why get water from two wells when one supplies all you need? However, index coverage can have an undesirable side effect, which is discussed in the section "Identifying Typical Problems."
Page 759
Lastly, one of the most important rules of thumb you should know as a DBA with regards to query tuning is the 5 percent rule. This applies to Oracle 7.x but not older versions. If, given available statistics from an ANALYZE run, the optimizer finds the expected number of rows to be returned greater than 5 percent, it opts for a full-table scan. With 6.x and 5.x, this percentage was 15 percent and 20 percent, respectively. Often, this is still called the 20 percent rule, although the actual percentage no longer applies to modern Oracle versions. This is vital information for a DBA to be aware of, for the optimizer might make the wrong (less than optimal) choice at times, given your knowledge of the application, and you would need to override it (with a hint or by rewriting the query, for example).
The SQL trace facility, along with the tkprof formatting program, provide the capability to generate and review performance statistics for individual SQL statements. You enable SQL trace at the instance or session level. However, there are a few init.ora parameters you must enable to use SQL trace in either case. If not already set, set the following parameters:
TIMED_STATISTICS=TRUE USER_DUMP_DEST=<directory> MAX_DUMP_FILE_SIZE=<size in operating system blocks>
Clearly, you want TIMED_STATISTICS on if you want SQL trace to be able to collect timing information. USER_DUMP_DEST should be specified, unless you want the trace files to be stored in the default, operating-system_specific location. In addition, MAX_DUMP_FILE_SIZE should be set sufficiently high to hold all the generated trace files. To enable SQL trace at the instance level (that is, for all sessions), set the following init.ora parameter:
SQL_TRACE=TRUE
All sessions will generate trace files. In addition to instance specifics, name trace files with session specifics to be able to distinguish which ones belong to which sessions and ensure that enough space is in USER_DUMP_DEST to hold all those files. Or, run the trace files in isolation, sequentially, if possible, so there will be no confusion. To enable SQL trace at the session level, issue the following DDL statement:
SQL> ALTER SESSION SET SQL_TRACE=TRUE;
The SQL trace only generates statistics for the duration of the session. Now, to be able to read the information created by SQL trace, you must use tkprof. You run the tkprof program at the operating system level (command line). The abbreviated syntax to do this is:
tkprof <tracefile> <outputfile>
An example in UNIX might look like:
hostname% tkprof ora_1776.trc ora_1776.out