4.6 How do I…Use the ANALYZE command to determine access statistics?

Problem

I am using Oracle’s Cost Based Optimizer by operating in CHOOSE mode, and want to improve query performance. I know that my objects must have statistics gathered on them for the CBO to work. How do I use the ANALYZE command to determine access statistics?

1 Technique

Use the ANALYZE TABLE command to gather statistics on a table and all associated indexes. The syntax of several different ANALYZE commands are shown in Figure 4.20. These will be used to gather and remove statistics for Oracle’s Cost Based optimizer.

There are two main options of the ANALYZE table command: ESTIMATE and COMPUTE. ESTIMATE looks at a sample of records in a table and determines various statistics about the data of the table, which is explained later in this How-To. Although ESTIMATE looks at only a sample of data, usually 25% or less, COMPUTE looks at every record in a table. Although this takes considerably more time, the statistics that it gathers may be more accurate. Which method to use will depend on which is more important—the speed to collect statistics or the accuracy of the statistics.

Steps

1. Connect to SQL*Plus as the WAITE account. If you have not done so already, create the EMPLOYEE04 table by running the CHP4_1.SQL script. At this point, there will be no records in the table. To get a better grasp of how statistics on tables work, create some sample data in the table.

2. Run CHP4_18.SQL, shown in Figure 4.21, to create sample data.

SQL> START CHP4_18.sql

1 row created.

1 row created.

2 rows created.

4 rows created.

8 rows created.

16 rows created.

SQL>

3. At this point, 32 records will be in the EMPLOYEE04 table but no statistics. Run CHP4_19.SQL to estimate statistics, sampling 25% of the records.

SQL> GET CHP4_19.sql

1 ANALYZE TABLE EMPLOYEE04

2 ESTIMATE STATISTICS SAMPLE 25 PERCENT;

SQL> /

Table analyzed.

SQL>

4. Now that the table has been analyzed, you may view the statistics that have been gathered. Run the CHP4_20.SQL script to see a few statistics.

SQL> GET CHP4_20.sql
1SELECT NUM_ROWS, AVG_SPACE, SAMPLE_SIZE, AVG_ROW_LEN
2FROM DBA_TABLES
3 WHERE TABLE_NAME = ‘EMPLOYEE04’
SQL> /
NUM_ROWS AVG_SPACE SAMPLE_SIZE AVG_ROW_LEN
-------- --------- ----------- -----------
32 1574 25 45

SQL>

How It Works

CHP4_18.SQL inserts sample records into the EMPLOYEE04 table. First, it creates one record. Then, it inserts into the table by selected all records from itself, several times. Thus, the first time the INSERT INTO EMPLOYEE04 SELECT * FROM EMPLOYEE04 command is issued, there is one record, so one record is inserted. The next time the statement is executed, there are two records in the table, so two records will be inserted. This method is a good process to quickly add sample data to a table.

CHP4_19.SQL uses the option SAMPLE 25 PERCENT. This will be much quicker than the COMPUTE option, but will potentially not be as accurate for the Cost Based Optimizer.

The statistics may be seen by querying the DBA_TABLES view, whose columns are described in Table 4.1. The CHP4_20.SQL script selects just a few of these columns, which can be expanded by any of the DBA_TABLES columns.

Column Name Description
OWNER User account that created the table.
TABLE_NAME Name of the table.
TABLESPACE_NAME Tablespace in which the table resides.
CLUSTER_NAME Name of the cluster for which the table belongs, if any.
IOT_NAME The parent Index-Organized table, if any.
PCT_FREE Percentage of space to keep free within each block of the table.
PCT_USED Percentage of space to use within each block of the table before allocating a new block.
INI_TRANS Initial number of transactions within each block.
MAX_TRANS Maximum number of concurrent transactions that can use the block.
INITIAL_EXTENT Size, in bytes, allocated for the first extent of the table.
NEXT_EXTENT Size, in bytes, allocated for subsequent extents.
MIN_EXTENTS Minimum number of extents to allocate when creating or truncating the table.
MAX_EXTENTSMaximum number of extents.
PCT_INCREASE Percent that each extent past the INITIAL and NEXT will increase in size by.
FREELISTS Number of freelist buffers allocated for the table; set to the maximum number of concurrent insert processes you predict for the table.
FREELIST_GROUPSParallel Server option to determine number of freelist groups for all instances inserting into the table.
LOGGING YES or NO value to determine if logging is used. If not, the table can be loaded in direct mode, created “AS SELECT … FROM table_b”, and other options quicker.
BACKED_UP Specifies whether a table has been backed up since last modified.
NUM_ROWS Number of records in the table. This is one of the most important statistics for the CBO.
BLOCKSNumber of used blocks of the table.
Column Name Description
EMPTY_BLOCKS Number of blocks that contain no data.
AVG_SPACE Average free space in the table.
CHAIN_CNT Number of chained rows.
AVG_ROW_LEN Average length, in bytes, of a row.
AVG_SPACE_FREELIST_BLOCKSAverage freespace of blocks on a freelist.
NUM_FREELIST_BLOCKS Number of blocks on a freelist.
DEGREE Parallel Query option that indicates the number of instances the table is to be scanned across.
INSTANCES Number of instances the table is to be scanned across.
CACHE Y or N: if the table will be cached in the buffer cache.
TABLE_LOCK ENABLED or DISABLED: is table locking enabled?
SAMPLE_SIZE Shows what percent estimate was used in the ESTIMATE clause of the ANALYZE command.
LAST_ANALYZED Date and time the table was last analyzed. Important to see if statistics need to be refreshed.
PARTITIONED YES or NO: is the table partitioned.
IOT_TYPE IOT or IOT_OVERFLOW or : determines if this is an index-organized table (or overflow of one). indicates a normal table.
TEMPORARY Y or N: is the table temporary?
NESTED YES or NO: is this a nested table?
BUFFER_POOL DEFAULT or other pool name: displays the default buffer pool in which the object will be placed.

Comments

Another method to analyze statistics is to issue EXECUTE DBMS_UTILITY. ANALYZE_SCHEMA(schema_name, option);, where the options are COMPUTE, ESTIMATE, or DELETE. Do not analyze data dictionary objects owned by SYS. Oracle Corp. has tuned the database to use the rule-based optimizer for these internal tables, and by analyzing them, the database will access the data dictionary in cost-based mode, possibly affecting performance.

The user account must have the ANALYZE ANY or be the owner of the table/index/cluster to analyze an object. ANALYZE can also be used to determine chained rows of a table, using the LIST CHAINED ROWS INTO chained_rows option. This is helpful in determining if the PCTFREE/PCTUSED storage parameters are set properly. You must run the UTLCHAIN.SQL script before this option may be used.

Using the COMPUTE option against a table is an intensive operation. Be sure to have sufficient free space in the default temporary tablespace for the user account issuing the command. The space should be as large as the object being analyzed.