Previous | Table of Contents | Next

Page 752

Notice that the hint immediately follows the SQL command (SELECT, INSERT, UPDATE, DELETE). The hint comment, like any regular SQL*Plus comment, begins with a /* and is immediately followed by a plus sign (+), to make /*+ the start of a hint comment. The usual */ ends the hint comment. Also consider how the optimizer knows which index to use if more than one exists, especially if, for some reason, more than one exists on EMPLOYEE_ID. In the previous example, only /*+ INDEX */ is stated, and it is assumed that only one (primary key) index exists for this table. Hence, for this listing, there is no ambiguity. In cases where there is ambiguity, you would need to include the table and index name in the hint to give the optimizer specific information. Hints are discussed in more detail in the following sections.

NOTE
The optimizer will accept only one hint per statement or statement block. The optimizer will ignore any additional hints. Further, the optimizer will ignore any misspelled hints. There is no reported error, and you would have to use Explain Plan, discussed later in this chapter, to determine whether or not your hint was followed.n

Ranking Access Paths

The Oracle 7.x optimizer access paths, ranked by fastest to slowest, top down, are shown in Table 30.1.

Table 30.1Oracle Optimizer Access Paths, Rank-Ordered

Rank Access Path
1 single row by ROWID
2 single row by cluster join
3 single row by hash cluster key with unique or primary key
4 single row by unique or primary key
5 cluster join
6 hash cluster key
7 indexed cluster key
8 composite key
9 single-column indexes
10 bounded range search on indexed columns
11 unbounded range search on indexed columns
12 sort-merge join
13 MAX or MIN of indexed columns
14 ORDER BY on indexed columns
15 full table scan
Source: Oracle Corporation.

Page 753

So, when the Oracle optimizer is following the rule-based strategy, it uses the fastest access paths that fit the given query. For example, Listing 30.2 shows the same query from Listing 30.1 without the hint, and it has been modified with a > WHERE clause.

Listing 30.2An Example of a Query Without a Hint and an Unbounded Range Search

SQL> SELECT EMPLOYEE_ID
        2> FROM EMPLOYEES
        3> WHERE EMPLOYEE_ID > 500000;

If Oracle were using rule-based as its strategy and a primary key index existed on EMPLOYEE_ID on the table EMPLOYEES, it would use access path #11 (unbounded range search on indexed columns) for Listing 30.2 because the WHERE clause is unbounded. In other words, a WHERE clause is unbounded when it has a greater than (> or >=) operator and no corresponding less than (< or <=) operator, or vice versa. The range is not known until execution time.

If the WHERE clause is bounded (in effect, it has both operators), the finite range is known at parse time. To make an unbounded WHERE clause bounded, you could use the literal maximum (if you knew it), or at least the theoretical maximum for that column according to your business rules. However, you probably wouldn't want to use the MAX operator in addition to your original unbounded range search—if you look at the rank ordering, using MAX is #13, which is below our objective of improving an unbounded search (#11) to a bounded one (#10). Using the MAX operator with your original unbounded range search would only slow you down. But using the actual maximum, if statistically known, would do the trick. Follow these dos and don'ts, which are illustrated in Listing 30.3:

Listing 30.3Following the Dos and Don'ts of Rewriting an Unbounded Search to Make It Bounded

SQL> SELECT EMPLOYEE_ID
     2> FROM EMPLOYEES
     3> WHERE EMPLOYEE_ID > 500000 AND EMPLOYEE_ID <=678453;

SQL> SELECT EMPLOYEE_ID
     2> FROM EMPLOYEES
     3> WHERE EMPLOYEE_ID > 500000 AND EMPLOYEE_ID <=999999;
SQL> SELECT EMPLOYEE_ID
     2> FROM EMPLOYEES

     3> WHERE EMPLOYEE_ID > 500000 AND EMPLOYEE_ID  <= MAX(EMPLOYEE_ID);

Page 754

Analyzing Queries to Improve Efficiency

To no surprise, this general field of attempting to reformulate or rewrite queries to improve their efficiency is known as query rewriting. Query rewriting is covered in more detail in the sections "Identifying Typical Problems" and "Rewriting Queries." Our previous examples are standard fare and easy to understand. For example, to get the maximum (MAX) of a set of values in a column, it would seem to take close to the amount of time of a full-table scan—regardless of whether or not the column was indexed. And, as you can see by the rank ordering of the MAX access path, this seems to be the case. However, there are far less understandable and more exotic ways of rewriting queries that are not as obvious.

Now that you've studied rule-based optimization, what about cost-based optimization? Take a look at the two queries in Listings 30.4 and 30.5.

Listing 30.4 First of Two Similar Queries on the Same Table

SQL> SELECT EMPLOYEE
     2> FROM EMPLOYEES
     3> WHERE EMPLOYEE_TYPE='VICE PRESIDENT';

Listing 30.5 Second of Two Similar Queries on the Same Table

SQL> SELECT EMPLOYEE
     2> FROM EMPLOYEES
     3> WHERE EMPLOYEE_TYPE='PROGRAMMER';

Now, suppose you have a relatively large software company with six vice presidents and about 2000 programmers out of about 6000 total employees. The two preceding queries will explain the difference between rule-based and cost-based query optimization and why cost-based optimization is preferred. Assuming you have a non-unique index on the column EMPLOYEE_TYPE, rule-based optimization chooses access path #9 (single-column indexes) for both queries. On the other hand, cost-based optimization—given the data distributions that programmers account for 1/3 of the total number of rows and that vice presidents account for only 1/1000—chooses to use the non-unique index for the query in Listing 30.4, yet is intelligent enough to opt for the full-table scan (the worst case access path #15) for the query in Listing 30.5. The intelligence comes from the stored knowledge of the data distribution.

If the optimizer must access a significant fraction of all the rows of a table, a full-table scan is actually more efficient than an index scan. This is because scanning an index for a row and then retrieving that particular row requires at least two read operations per row, and sometimes more—depending on how many distinct data values are in the index. However, the full-table scan only needs one read operation per row. Multiplied times many rows, it is pretty clear why an index is slower when accessing a large amount of a table, compared to just reading through the entire table, as with the query in Listing 30.5. Aside from the total number of read operations, another major reason why a full-table scan is better than an index on retrieving large parts of a table is that most of the necessary reads are sequential or nearly sequential.

Previous | Table of Contents | Next