Page 755
Obviously, to read from an index, then read from a table, then back to the index, then back to the table, and so forth cannot be sequential, whether the table and index are on the same disk or are spread out in good tuning fashion. The index slows things down for queries and data like the query in Listing 30.5.
The index clearly outperforms a full-table scan for the queries and data like in Listing 30.4. This is because you only need to retrieve a small number of rows (6), and our total number of read operations is only 12 (2¥6). You still must perform 6,000 read operations (the entire table) with a full-table scan, because a full-table scan has no knowledge of where individual pieces of data are stored. It must check every row. In fact, the queries and data in Listing 30.4 are classic examples of the need for an index, especially if executed often enough. The best way to sum up the discussion of these two similar queries with different data counts is that these queries show how a user or programmer could not know that, given rule-based optimization, query and data like that in Listing 30.5 could perform better by using cost-based optimization.
The next obvious question is "How do I specify the optimizer mode in Oracle?" You have learned about rule-based and cost-based optimization so far. You can specify your desired form of optimization at the instance, session, or statement level. To specify at the instance level, set the init.ora parameter OPTIMIZER_MODE to one of the following values:
CHOOSEWhen set to this value, the optimizer chooses the cost-based mode if statistics are available (have been run by the DBA). Otherwise, it resorts to rule-based optimization.
RULEThe optimizer uses the rule-based approach.
FIRST_ROWSThe optimizer chooses cost-based (again if statistics are available) to minimize response time, that is, to minimize the time to present the first rows to the screen. Use this mode if you have a highly interactive, screens-based application, as many OLTP and smaller DSS systems are.
ALL_ROWSThe optimizer chooses cost-based (again if statistics are available) to minimize throughput, that is, to minimize the total number of rows passing through the system per unit of time (transactions per second). Use this if you have a batch or large DSS system.
To specify at the session level, issue the following DDL statement:
SQL> ALTER SESSION SET OPTIMIZER_GOAL=<value>;
where value is one of the previously mentioned optimizer modes (CHOOSE, RULE, FIRST_ROWS, ALL_ROWS). The result is only good for the session and hence must be reissued in a future session if desired.
To specify at the statement level, use hints as discussed earlier. The hints may be any of the optimizer mode values (CHOOSE, RULE, FIRST_ROWS, ALL_ROWS), or they may be one of the access paths shown in Table 30.2.
Page 756
Table 30.2Access Paths for Hints
Access Path | Description |
ROWID | The optimizer uses ROWID scan for retrieval. |
CLUSTER | Uses a cluster key scan. |
HASH | Uses a hash index scan. |
INDEX | Uses an index scan. |
INDEX_ASC | Uses an index scan and scans in ascending order. |
INDEX_DESC | Uses an index scan and scans in descending order. |
AND_EQUAL | Uses multiple indexes and merges their results. |
ORDERED | Uses the order of the tables in the FROM clause as the order of the join. |
USE_NL | Uses the nested loops method for joining tables. |
USE_MERGE | Uses the sort-merge method for joining tables. |
FULL | Uses a full-table scan. |
Table 30.3 Access Paths for Hints in Version 7.3 and Later
Access Path | Description |
CACHE | Tells Oracle to treat the table as a cached table, keeping its blocks in the SGA after a full scan for later quick access. | Specifies type of join to use during an antijoin (Oracle 7.3 and later). |
MERGE_AJ | Specifies type of join to use during an antijoin. |
NO_MERGE | Tells Oracle not to merge the view's SQL syntax with the syntax of a query that uses the join. |
NO_CACHE | Marks blocks as "least recently used" so they get removed from SGA soon. |
NONPARALLEL | Allows the disable of a parallelism query. |
ROWID | Uses TABLE ACCESS BY ROWID operation. |
STAR | Uses a composite key/start query execution path when resolving a join. |
USE_CONTACT | Forces OR conditions in the WHERE clause to be compounded as UNION ALL. |
USE_HASH | Uses a hash join. |
Page 757
Also, to use any of the cost-based optimization modes (FIRST_ROWS, ALL_ROWS), the DBA must run the statistics periodically to keep the stored data distributions up-to-date. A DBA can either take all statistics or take some statistics with a (simple random) sample. To take all statistics means to do a full-table scan. To take a sample means to access some fraction less than the total number of rows. A DBA issues the following DDL statement to take all statistics:
SQL> ANALYZE TABLE <table_name> COMPUTE STATISTICS;
This works fine for relatively small tablesfewer than a million rows. But for large tables, DBAs may want to opt for taking a sample. Statistically, as long as the sample is large enough relative to the total number of rows, it is sufficiently accurate not to require taking all statistics. To take a sample, a DBA issues the following DDL statement:
SQL> ANALYZE TABLE <table_name> ESTIMATE STATISTICS;
This statement causes the sampling (by default) of up to 1,064 rows, regardless of how large the actual table may be. A DBA may specify percent by issuing:
SQL> ANALYZE TABLE <table_name> ESTIMATE STATISTICS SAMPLE 10 PERCENT;
This samples 10 percent, rounded to some whole number of rows, of the total number of rows in the table. For example, this samples 600 rows to use for estimates out of your 6,000-row EMPLOYEES table. A DBA may also specify the actual sample size, by issuing something like:
SQL> ANALYZE TABLE <table_name> ESTIMATE STATISTICS SAMPLE 1000 ROWS;
Clearly, this samples precisely 1,000 rows of the specified table.
CAUTION |
If you specify a percentage greater than 50 (or a number of rows greater than 50 percent), the ANALYZE command resorts to a full-table scan COMPUTE rather than your specified ESTIMATE. |
With Oracle versions 7.3 and later, the ANALYZE command has added a FOR clause. With this clause, you can name specific columns with FOR COLUMN. For example, if only a few columns have changed since the last ANALYZE runor you want to reduce the ANALYZE run timespecify FOR INDEXED COLUMNS ONLY, which does exactly what it says by not taking statistics on non-indexed columns.
A DBA should run ANALYZE on a regular, periodic basis. However, what might be sensible for one application might not be for another. For example, a DSS, which is uploaded from three feeder batch systems on a monthly basis, only needs to have those tables analyzed (with COMPUTE) that have been loaded, and then only immediately following the load or at some reasonable time before their next usage. In contrast, a high-activity OLTP systemlike flight reservation informationmight need to have nearly all its tables, which may be modified on a minute-by-minute basis, analyzed (with ESTIMATE) on a short interval, such as every hour or less. These types of systems are probably mirrored, and one production copy should be analyzed offline while the other continues to serve, and then vice versa. In general, ANALYZE your tables as frequently as possible or at intervals when your tables change most, so that the