Page 38
And if you need only one machine for a relatively small database of <10 GB, consider NT. On the other hand, if you only need one machine but the storage is large or high performance is required, consider UNIX. Finally, if you don't really have a predominant environment or you're just starting up new (or can treat your purchase in that respect), by all means consider both Windows NT and UNIX and let all the factors come into play except the environment factor.
Aside from secondary storage, disks, and RAID, which we emphasized when discussing the storage hierarchy, we need to consider other hardware issues and operating system (OS) components, such as memory and CPU.
NOTE |
What we mean by memory is often referred to as core memory, physical memory, main memory, or random access (RAM) memory. These are all terms for the same thing, so we'll just use memory. n |
Essentially, memory is very fast electronic storage. It stores instructions and data. For a DBMS, the most important thing is that the OS can yield some of its memory to it. Then the DBMS can do with it what it will. And it does. This is why a DBMS is sometimes referred to as a micro-OS, or an OS within an OS, or an OS on top of an OS. In essence, a DBMS takes care of itself with regards to the care and feeding of its resource needs, albeit in deference to and cooperation with the OS. This is often done through a capability known as shared memory, especially in UNIX. (See Appendix A, "Oracle on UNIX," for more details.)
Locking, a key component to DBMSs, is also handled through memory structures. Shared resources are secured one at a time from competing processes. A DBMS either handles its own locking, does it partially with the OS, or yields locking duties to the OS.
Once an OS yields some of its memory to the processes that constitute a DBMS, the DBMS takes it from there, storing in that memory space its own instructions (code caching) and data (data buffering). Without getting too much into Oracle's architecture, which will be discussed later, let us broadly map what we have discussed. Oracle's memory access is based on the allocation of its System Global Area (SGA). The SGA contains a structure known as the data block buffers (data buffering) and shared pool. The shared pool contains the library cache (code caching), as well as the data dictionary cache. Undo (rollback) blocks are buffered within the data block buffers, and redo is buffered in its own redo log buffers section. These components are all configurable through Oracle's parameter file, init.ora. More on this later.
RDBMSs have come a long way with regards to CPU utilization. As mentioned earlier, most database systems of the past have tended to be I/O-bound. However, with VLDBs and OLAP/MDD systems, more and more database systems are memory- or CPU-bound. With VLDBs, memory is a bottleneck because the amount of memory is usually too small to be of use with huge amounts of data. With heavy analytical or scientific systems, or even DW systems, CPUs can be the bottleneck due to the enormous, concurrent computational demands.
Page 39
With the advent and evolution of multiprocessor machines within the last 10 years, many things have changed. Very large memory is now possible (>= 10s of GBs). Also, CPU architectures and speeds have advanced considerably. Word sizes are 32-bit and 64-bit now, with clock speeds around 200 to 300 MHz at the time of this writing. And these CPUs have pipelined architectures, permitting multiple instructions per clock tick (CPU step). But the important thing is that the RDBMS software has followed suit.
Oracle and the other major RDBMS vendors have likewise rewritten their code over time to take advantage of these hardware advances. Aside from shared memory and very large memory, multiprocessing is the major advancement and refinement of recent years. Two major classes of multiprocessors exist now:
In SMP machines, such as those Sun offers, the CPUs use shared memory and other internal hardware items such as buses. SMPs now have up to 64 processors. MPP machines have a shared-nothing architecture and are like micro-LANs, or Local Area Networks in a box. MPPs can have hundreds or thousands of processors.
RDBMS software now is either fully multithreaded or pseudo-multithreaded to take advantage of the processing power of the multiprocessing machines. To an operating system, an RDBMS is just one or more processes. Multithreading is a piece of software's capability to run multiple subprocesses, or threads, within its same parent process environment. Sybase and Informix, for example, are fully multithreaded. Oracle is pseudo-multithreaded when using the MultiThreaded Server (MTS) option; otherwise it is single-threaded. A DBA simply needs to know his number of CPUs, and he can configure the Oracle MTS. Other Oracle parameters are affected by the number of CPUs, such as certain locking structures known as latches.
What are the major principles of physical database design? Well, we have alluded to the fact that physical database design is actually pre-tuning, or nothing more than the second stage of tuning. (Logical database design is the first stage.) It should not be surprising, therefore, to learn that the major physical database design principles are essentially the same as the major performance tuning principles, except that we are dealing with the database before and during its construction, rather than after. There are many design principles, but the major ones always include the following: