Page 29
Page 30
Before we discuss physical database design, and later, performance tuning, it is important to cover the major application types. First, let's clarify the terms transaction and query. In database theory, broadly speaking, a transaction is a single, atomic SELECT, INSERT, UPDATE, or DELETE. However, with regards to application types, a transaction is generally more loosely defined as a business transaction, possibly containing multiple INSERTs, UPDATEs, or DELETEs. In addition, DML truly refers to SELECT, INSERT, UPDATE, and DELETE. However, you will find "DML," like "transaction," in this context often used to mean only INSERT, UPDATE, and DELETE operations. In sum, DML and transaction usually mean write-only, or modify-only. To distinguish the SELECT operation as read only, the term query is used.
We'll follow these latter industry conventions for the sake of common understanding, although they are quite confusing and, in fact, at conflict with their real definitions. That said, there are three main application types in the world of database systeqÓ applications:
OLTP: On-Line Transaction Processing. An OLTP system is an application that contains heavy DML, transaction-oriented activity; primarily updates, but also inserts and deletes. Classic examples are reservation systems such as those used by airlines and hotels. OLTP systems can have high concurrency. (In this case, high concurrency typically means many users simultaneously using a database system.)
DSS: Decision Support System. A DSS is typically a large, read-only database with historical content, and is generally used for simple canned or ad hoc queries. Often a DSS grows into VLDB, DM, or DW in the manner discussed in Chapter 1, "Databases, DBMS Principles, and the Relational Model." A good example of a DSS is a database behind an organization's intranet.
Batch: A batch system is a non-interactive, automatic application that works against a database. It usually contains heavy DML and has low concurrency. (In this case, low concurrency typically means few users simultaneously using a database system.) The ratio of querying to transactions determines how to physically design it. Classic examples are production databases and operational databases relative to DWs.
Some less common application types include:
OLAP: On-Line Analytical Processing. An OLAP system offers analytical services, as the name implies. This means mathematics, statistics, aggregations, and high computation. An OLAP system doesn't always fit the OLTP or DSS molds. Occasionally, it is a cross between the two. In addition, some people simply view OLAP as an extension or an additional functional layer on top of an OLTP system or DSS. ROLAP stands for Relational OLAP. This term doesn't really add much in the way of classification, though. An OLAP tool is often tightly coupled with a MDD (discussed in Chapter 1), and sometimes it is simply layered on top of a modified RDBMS. A demographic database for social statistics is a good example.
VCDB: Variable Cardinality Database. This type of database is frequently a back-end for a processing system that causes the tables in that database to grow and shrink considerably during the processing phase, which otherwise may be constant or periodic.
Page 31
Cardinality refers to the number of rows in a table at a given time. Some tables may be static lookup tables, but most are definitely highly variable in their number of records. Good examples are any databases that record short-lived activities, such as a security authorization database.
Quantitative estimating of any sort is an attempt to quantify, or measure, some process or product. With databases, the two main types of quantitative estimates are transaction analysis (sometimes referred to as volume analysis) and sizing analysis.
Transaction analysis is simply putting numbers on the database system. Different measures mean different things, and apply to certain kinds of database systems. The most typical measures, or metrics, include the minimums, averages, or maximums of the following:
There are many more metrics, such as number of rows affected by an operation, but these will offer some insight.
Usually, these measures have more meaning if they are expressed in the context of a given time period or activity. For example, it is usually more useful to know the maximum number of transactions per second than the cumulative number of transactions. The latter tells us little about the typical stress, or load, on the database. These numbers also mean more in the context of what kind of application your database serves.
If your application type is an OLTP system, the number of concurrent users, transactions per second, and response time are more important, because concurrence is the prime issue with OLTP.
If you have a batch system, elapsed time and number of concurrent programs is perhaps most important. A DSS might require you to know the bytes read per some unit of time, among other things.
You need to ask these questions as a DBA and gather your best possible answers, for this will affect your physical design and your performance tuning. In addition, these figures are largely the same measures that are used for benchmarking efforts. What we're trying to do here, though, is gather estimated, prototyped, or piloted numbers before the system is actually built, in order to help build it.
Page 32
Sizing is perhaps a more widely known activity, if not widely practiced often enough by all DBAs. Whereas in transaction or volume analysis, we ask "How often?" and "How much?" with regards to processes and data flow; with sizing we ask "How much?" with regard to data storage.
The fundamental thing is simply that a table with n rows of b max bytes per row will need at least n¥b bytes of storage. Of course, we've left out overhead. And this calculation varies considerably with the vendor of choice. Oracle, for example, offers a fairly complicated set of steps, as do other vendors, to help size a table, an index, or other structures. The best recommendation is to place this formula into a spreadsheet once, and you'll never have to do it again. Just pull it out and dust it off every time you need to do sizing for another database project. Just plug in the relevant input numbers, such as block size, block parameters, number of rows, column sizes, and so on. This way, you'll be able to subtotal by table, by sets of tables, by indexes, by sets of indexes, and for the whole database.
Then a seasoned DBA will add a fudge factor on top of that estimate to account for any mistaken underestimates, accidental oversights, and unanticipated future changes. Something like the final estimate size¥120 percent is not unreasonable.
Remember, too, that the figure you size for is usually based on tables and indexes alone. As with all RDBMS vendors, there are many more architectural considerations, to be discussed in Chapters 5 and 6, which will add to the overall system size. And whatever final figure of bytes you come up with, remember that this is the usable space you need, not the amount of raw disk space. Low-level hardware and a high-level operating system can subtract from the initial unformatted (raw) size of the disk, leaving less usable space than you'd think.
For example, formatting can consume 15 percent of a 4GB disk. This only leaves 85 percent, or 3.4GB. If your final sizing estimate was 20 GB, and you didn't take this into account, you'd purchase 5¥4GB disks and yet have only 5¥3.4GB, or 17GB, of usable space. You'd need another disk.
Denormalization refers to dropping the level of your tables' normal forms back down a few notches for physical design, performance tuning, or other reasons.
Recommendation: Don't do this unless you have very good reasons. A shortage of disks with no budget for new ones may be a good reason. Expecting to have poor performance without evidence, as opposed to actually having poor performance, is not a good reason. In fact, even poor performance itself does not immediately indicate the need to back off on your logical design. Your first step should be performance tuning. Denormalization should always be a last resort.