14.9 How do I…Structure data warehouses for star query optimization?

Problem

My database contains some data warehouse structures, each featuring one large table and several smaller base tables. Some queries against this structure require extremely long processing times. How can I improve Oracle8’s performance for queries against my data warehouse?

Technique

The classical data warehouse structure features one large table, called a fact table, where quantitative and categorical data for an ongoing business process resides. In most data warehouse implementations, this table is very large. Supplementing the fact table are several much smaller dimension tables, which store additional data about the grouping variables residing in the fact table. Entity relationship diagrams for a structure like this often surround the large fact table with the smaller dimension tables in a star pattern. The name “star schema” refers to this pattern.

Suppose, for the purposes of example, that computer help desk staff in a large company log requests for assistance in a database application. This application records the call duration, the software package that generated the question, and the caller’s department. After years of use, the help desk database may come to contain hundreds of thousands of records. It is conceivable that the organization may choose to store this large volume in a data warehouse to provide a historical perspective of help desk activity. Ad hoc queries against the data warehouse may not perform well if developers let the Oracle optimizer choose an execution path. Some additional indexing on the schema, however, can provide some performance advantages.

Four factors influence the Oracle8’s decision about star query optimization. They are the following:

For this example, assume the following about these factors:

At the start, the only indexes on the fact and dimension tables are those created by the CHP14_21.SQL script in Step 1.

The initialization parameter STAR_QUERY_ENABLED is set to the default value FALSE.

4 The STAR_TRANSFORMATION hint is not used.

4 The fact and dimension tables are not analyzed.

Steps

1. Run SQL*Plus, connect as the WAITE user, and use the START command to load and execute the script CHP14_21.SQL. The script and its output appear in Listing 14.4.

Listing 14.4 Running the CHP14_21.SQL script in SQL*Plus to create a sample data warehouse schema

SQL> start chp14_21

SQL>

SQL> CREATE TABLE emp14_9

2 ( emp_no number(5),

3 dept_name varchar2(20),

4 --

5 CONSTRAINT emp14_9_pk PRIMARY KEY (emp_no)

6 );

Table created.

SQL>

SQL> CREATE TABLE package14_9

2 ( package_no number(3),

3 package_name varchar2(20),

4 --

5 CONSTRAINT package14_9_pk PRIMARY KEY (package_no)

6 );

Table created.

SQL>

SQL> CREATE TABLE priority14_9

2 ( priority_no number(2),

3 priority_level varchar2(20),

4 --

5 CONSTRAINT priority14_9_pk PRIMARY KEY (priority_no)

6 );

Table created.

SQL>

SQL> CREATE TABLE helpcall14_9

( package_no number(3),

3 emp_no number(5),

4 priority_no number(2),

5 call_duration number(3),

6 --

7 CONSTRAINT helpcall14_9_pk

8 PRIMARY KEY (package_no, emp_no, priority_no),

9 CONSTRAINT helpcall_package_fk

10 FOREIGN KEY (package_no)

11 REFERENCES package14_9(package_no),

12 CONSTRAINT helpcall_emp_fk

13 FOREIGN KEY (emp_no)

14 REFERENCES emp14_9(emp_no),

15 CONSTRAINT helpcall_priority_fk

16 FOREIGN KEY (priority_no)

17 REFERENCES priority14_9(priority_no)

18 );

Table created.

SQL>

The script creates three dimension tables named PACKAGE14_9, EMP14_9, and PRIORITY14_9. It also creates a fact table named HELPCALL14_9. Each dimension table has one implicit primary key index. The fact table’s primary key is the result of concatenating the primary keys of the dimension tables. The fact table also contains foreign key references on the grouping attributes that refer back to the dimension tables. Lines 9 through 17 of the CREATE TABLE statement for the fact table implement the foreign key constraints.

2. Enable the AUTOTRACE option for the SQL*Plus session by issuing the SET AUTOTRACE ON command at the SQL*Plus prompt.

3. Use the START command to load and execute the script CHP14_22.SQL. The script appears in Figure 14.21. There is no data in the star schema so the query returns no rows, but the AUTOTRACE option results in the execution path report appearing in Figure 14.22.

The execution path, read from inside out and from top to bottom, indicates that the optimizer’s first step is an index scan followed by a table access by ROWID on the PRIORITY14_9 table. Oracle8 submits this output and the rows returned from a full table access on the HELPDESK14_9 fact table to a nested loops join. There is more to the execution path, but the damage has already been done. It is likely that the nested loops join will return enough rows to degrade the performance of the rest of the query.

4. Use the START command to load and execute the CHP14_23.SQL script. The script creates indexes on the dimension table columns, which are likely to appear in ad hoc queries against the HELPDESK14_9 fact table. Figure 14.23 presents the script and its output.

5. Execute the CHP14_22.SQL script again using the START command. The execution path from the script appears in Figure 14.24.

This new execution plan is quite different from the one that Step 3 generated. The optimizer uses the indexes on the non-key columns of the dimension tables to perform range scans and nested loops joins. The large fact table HELPDESK14_9 is uninvolved in the query until the last step and even then table access is by ROWID (instead of the full table scan we saw in Step 2). This is often referred to as a star query execution path.

How It Works

Step 1 creates a set of tables consistent with a classical data warehouse star schema architecture. Step 2 enables the AUTOTRACE option so SQL*Plus reveals the execution plan for subsequent SQL statements. The script in Step 3 follows the traditional execution plan, Step 4 creates indexes on the non-key columns of the dimension tables, and Step 5 shows the new execution path.

Comments

Star query execution can result in huge performance benefits for traditional and data warehouse applications. The performance benefits intensify if the fact table is extremely large. Be advised, though, that if the fact table is not appreciably larger than the dimension tables then a traditional execution path may perform as well as, and perhaps better than, a star query.

The online documentation states that the rule-based optimizer does not recognize star queries and that developers must ANALYZE tables to realize star query execution paths. Under the 8.0.2.0.2 beta release of the Oracle8 server for Windows NT, this is not true; even the rule-based optimizer recognizes star queries in the presence of indexes on the non-key columns of the dimension tables.

Ordering the tables in the FROM clause of a potential star query in the order of their primary key columns in the fact table’s primary key may induce the optimizer to use a star query. List the fact table last in the FROM clause and supply the ORDERED hint if the optimizer still ignores a star query execution path (see a full discussion of optimizer hints in How-To 14.7).

The bottom line is that the only way to tell that the optimizer has chosen a star query execution path is to review either SQL*Trace, EXPLAIN PLAN, or AUTOTRACE output. If the execution plan indicates the optimizer is using indexes other than the primary keys of the dimension tables then Oracle8 is executing a star query.

As always, pay close attention to elapsed time, query statistics, and the execution plan when bypassing the optimizer’s default behavior. Otherwise, you may “tune” yourself right into a performance problem.