Previous | Table of Contents | Next

Page 649

CHAPTER 26

Parallel Query Management

In this chapter

Page 650

Introduction

Large systems with significant memory and CPU resources were available for decades. However, these systems were available only with proprietary operating systems and were not very cost effective. Multiple CPU machines with UNIX open system architecture were available in the early '90s. These machines had significant hardware resources and were comparatively cheap. Oracle introduced Parallel Query Option in Oracle version 7.1 to effectively use the hardware resources available in these systems. Oracle Parallel Query Option allows long running SQL operations, mainly queries, to be spread among multiple CPUs in a coordinated fashion. This reduces the elapse time for execution of resource-intensive SQL operations.

The Parallel Query Option allows certain operations to be performed in parallel by multiple server processes. One process, known as the query coordinator, dispatches the execution of a statement to several servers, coordinates the results from all the servers, and sends the results back to the user.

Although the feature is generally referred to as PQO, it also includes:

Parallel Load

The SQL*Loader direct path allows you to load data into the same table or partition simultaneously, using multiple SQL*Loader sessions. For example:

sqlload saledba/saledba control=sales1.ctl parallel=TRUE direct=TRUE

sqlload saledba/saledba control=sales2.ctl parallel=TRUE direct=TRUE

sqlload saledba/saledba control=sales3.ctl parallel=TRUE direct=TRUE

Note the use of keywords parallel and direct in the command line. Also note the use of three different input data files for each of the sessions. Other important features of parallel load are:

Page 651

As mentioned above, each parallel load session loads data into new extent(s) and does not use any existing extents, even if they do not contain any data. Thus, the initial extent of a table that is loaded only by parallel loads is never used. To optimize disk space usage, you should either create a very small initial extent or put data into the initial extent by loading it without the parallel option.

Parallel Recovery

Oracle's basic read-write unit is a data block. Whenever changes are made to a block, Oracle records these changes in the form of a redo log so that the block can be reconstructed using these logs if needed. Due to media failure or for any other reason, if the contents of the present data file(s) are lost, the file is restored from a suitable backup copy and then recovery is done. The recovery process involves:

To perform recovery in parallel you should set the initialization parameter RECOVERY_ PARALLELISM. Alternatively, you can use the PARALLEL clause of the RECOVER command. During parallel recovery, the Server Manager or SQLDBA session reads the redo log file and passes on the changes to the parallel server processes. These processes then read the corresponding data file(s) and apply the changes.

Parallel Propagation (Replication)

Replication allows you to maintain multiple images of one or more database objects in multiple databases. Oracle Server transfers data over database links between these databases to propagate changes made to the replicated objects. The SNP background processes perform the data replication. However, if the data volume to be replicated is significantly large, it might take longer to synchronize the objects. Oracle8 allows parallel replication, whereas multiple parallel server processes can be used to propagate transactions.

Parallel SQL Execution

An Oracle database is a collection of physical data files that are manipulated through various processes. A set of such background processes and a shared memory segment, collectively called Oracle Instance, enable concurrent users to interface with the database. When a user wants to use (select, insert, update, delete) the data in the database, he or she needs to connect

Page 652


to the database. On most of the UNIX systems, Oracle uses a two-task architecture. In this scheme, when a user connects to the database, an additional process (frequently known as the shadow process) is forked. The shadow process accesses the shared memory segment and the datafiles on behalf of the user.

NOTE
In a multi-threaded server (MTS) environment, when a user logs on to the database, a shared server process is used instead of the shadow process. During Parallel Query Option this shared server process acts as the query coordinator.n

Oracle stores the data in the data files in blocks. A data block is Oracle's basic data input/output unit. When the user uses data, the corresponding data block is read by the shadow process in Oracle's buffer cache (part of the shared memory segment). The block in the buffer cache can then be modified by the user shadow process. Modified data blocks (dirty buffers) are written back to the disk by the DBWR. Thus, a shadow process does the major part of the work required by the user. Operations like parsing the SQL statement, sorting the data, and so on are also done by the shadow process.

This works very well in an OLTP system where the amount of data manipulated by a user is relatively small. However, in a DSS environment, when a user generally processes huge amounts of data, the shadow process might take a while to do the required work. While the shadow process is working long and hard to manipulate the data, it is quite possible that the system has idle CPU memory resources. This is where Oracle's Parallel Query Option comes in handy. With this option, Oracle can divide a user's large data processing request into multiple, comparatively small units of work that are then concurrently executed by different
processes. It uses a set of dedicated background processes, known as parallel query slaves (servers), to do such work. The shadow process gets promoted to a management role and is called the query coordinator. It is the responsibility of the query coordinator to:

  1. Break down the functions into parallel pieces.
  2. Ensure that a sufficient number of parallel query slaves are available.
  3. Initialize the server process for the given work and assign the work among the PQO server processes.
  4. Gather the output from the slave processes and return the output.
  5. Once the desired work is finished, free the query servers, making them available for other work.

Under favorable conditions, parallel execution reduces the execution time by as many factors as the number of query slaves it uses. However, it does not reduce the total CPU time taken by the SQL statement. Parallel execution uses more memory than serial execution, so if the
machine does not have spare CPU and memory resources, it may not yield desired scalability.

Previous | Table of Contents | Next