Previous | Table of Contents | Next

Page 12

Maintaining and Enforcing Integrity

The integrity of data refers to its consistency and correctness. For data to be consistent, it must be modeled and implemented the same way in all of its occurrences. For data to be correct, it must be right, accurate, and meaningful.

One way a DBMS maintains integrity is by locking a data item in the process of being changed. A database usually locks at the database page level or at the row level. Incidentally, locking also permits concurrency, which we'll cover next.

Another way a DBMS enforces integrity is to replicate a change to a piece of data if it is stored in more than one place. The last way a DBMS enforces integrity is by keeping an eye on the data values being entered or changed so that they fall within required specifications (for example, a range check).

If proper modeling and implementation practices are followed, to be discussed later in Chapter 2, "Logical Database Design and Normalization," the DBMS helps to automatically enforce this integrity when put in place, for example, through a trigger or constraint. Without integrity, data is worthless. With integrity, data is information. Integrity not only enhances data, but also gives data its value.

A DBMS must manage concurrency when it offers multiuser access. That is, when more than one person at a time must access the same database, specifically the same pieces of data, the DBMS must ensure that this concurrent access is somehow possible. Concurrent can be defined as simultaneous, in the looser sense that two or more users access the same data in the same time period.

The methods behind how a DBMS does this are not too complex, but the actual programming behind it is. Essentially, when two or more people want to simply look at the same data, without changing it, all is well. But when at least one person wants to change the data and others want to look at it or change it too, the DBMS must store multiple copies and resolve all of the changed copies back into one correct piece of data when everyone is done.

We mentioned one aspect of concurrency management already: locking. Generally speaking, the finer-grained (smaller) the lock, the better the concurrency (that is, more users have simultaneous access without having to wait). Rows are typically smaller than the smallest database page or block. Hence, row-level locks serve short, random data transactions better, and block-level locks may serve long, sequential data transactions better.

This is how concurrency and integrity are linked. When a person wants to look at or change a piece of data, that person is performing a transaction with the database.

Understanding Transactions

A DBMS has, as part of its code, a transaction manager whose sole purpose is to manage concurrency and ensure integrity of transactions. The transaction manager has a tough job because it must allow many people to access the same data at the same time, and yet it must put the data back as though it had been accessed by one person at a time, one after the other, which ensures its correctness. Therein lies the fundamental answer as to how a DBMS must resolve all those multiple copies of data. Transactions occurring during the same time period

Page 13

can preserve the accuracy of the data if (and only if) they are serializable. Simply put, the DBMS must rearrange them so that the net result of all the changes is as if they all occurred single file.

The transaction is a unit of concurrency, or a unit of work. Nothing smaller or lesser than a transaction can occur. That is, no one can halfway change a piece of data. All transactions must be atomic in that each individual transaction either completes or not. Until modern twentieth century physics came along, the atom was thought to the smallest unit of matter. Likewise, the transaction is the smallest unit of concurrency. It is all or nothing. A transaction that completes is said to be committed, and one that does not is rolled back.

The DBMS handles recovery using transactions as units of recovery. Normal completions, manual requests for aborts, and unexpected aborts all require the DBMS to again call upon its multiple copies of data to either commit or roll back the data. A transaction log is kept by the DBMS for the purpose of rolling back (undo), and also for rolling forward (redo). A rollback is an undo operation. A rollforward is a redo operation that takes place when, for example, a committed transaction doesn't make it from memory to disk because of a hardware or software failure. The DBMS simply redoes it. Hence, the key to transaction recovery in a DBMS is that a transaction must be atomic and can be done, undone, or redone when necessary.

Communicating with the Database

A DBMS is no good if you can't talk to it. How does one talk to a DBMS? Through an access or query language. The Structured Query Language (SQL) is the predominant query language today. It works mostly with the predominant type of DBMS that we will discuss shortly, the Relational DBMS (RDBMS). All communication to and from the database should pass through the DBMS, and to do this, we use SQL or something like it. DBAs use query languages to build and maintain a database, and users use query languages to access the database and to look at or change the data.

Understanding an RDBMS

In 1970, E. F. Codd fathered the concept of the relational model. Before RDBMSs like DB2 were born, hierarchic (IMS) and network (IDMS) models were commonplace. Before these models, databases were built using flat files (operating system files, not necessarily flat!) and third generation language (3GL) access routines. In fact, some customized systems are still built this way, justified or not. Many of these legacy databases still exist on mainframes and minicomputers. CODASYL (from the COnference on DAta SYstem Languages) was a database standard created by the Database Task Group (DBTG). This was a COBOL-based network database standard, and IDMS was one vendor implementation. Since the seventies, however, RDBMSs have come to dominate the marketplace, with products such as Oracle, Sybase, Informix, and Ingres.

Recently, object-oriented (OO) DBMSs have come into the foreground and found many > niche applications, such as CAD/CAM, engineering, multimedia, and so forth. OO DBMSs filled those niches because their strengths are handling complex data types in an almost

Previous | Table of Contents | Next