Visual Basic Expert Solutions

book coverQUE

Chapter 10

Structured Query Language

By Steve Potts


The Structured Query Language, SQL (pronounced SEE-kwal), is the industry standard database manipulation language. This language enables you to manipulate almost every major database management system in widespread use at this writing.

For the Visual Basic programmer, SQL is important because it is the primary means of communication between Visual Basic and both the Jet database engine and the Microsoft SQL Server database engine that runs under Windows NT Server. While it is possible to create some applications with Visual Basic and the Data Access Object, substantial applications will require that you learn SQL.

SQL traces its genealogy back to the "olden" days of relational database theory. In 1970, Dr. E.F. Codd, then an IBM employee, noticed a theoretical similarity between relational mathematics and the problem of manipulating a database. He wrote a famous paper explaining these similarities, and thus invented the relational database. The first language that Dr. Codd and his assistants at the IBM San Jose laboratory created to implement this theory was a primitive form of SQL. While it differs from modern SQL in important ways, it was the seed which later caused the SQL language to be popularized.

Unlike Visual Basic itself, SQL is a published American National Standards Institute (ANSI) standard (X.3.135-1992). Most of the SQL implementations in use today follow this standard fairly closely. However, because the standard doesn't address 100 percent of the issues needed for a complete implementation of a database management system, the different dialects of SQL differ slightly. While these slight differences complicate the porting of your applications from one database management system (DBMS) to another, they do not become an issue often during the learning of the SQL syntax and implementation strategy. Most SQL dialects contain some useful features that are missing from the ANSI standard. For example, Microsoft's SQL Server product uses a dialect of SQL called Transact SQL. This language contains many more reserved words than the SQL standard calls out. It even supports conditional execution and loops within the SQL statement. Access SQL supports the Transform and Pivot statements that process crosstab queries. These features are useful, but they should be avoided if you plan to port your application to another database management platform that supports its own dialect of SQL.

The goal of SQL is to provide a functional interface between the user and the functionality of the database management system. This is an important concept. The inner workings of a modern database management system are extremely complex. Pointers, multiple processes, indexing schemes, and buffers abound. The goal of SQL is to provide the user with a simple way of manipulating the database, while allowing the database management system development team the freedom necessary to create the complex system needed for good performance.

This chapter approaches the SQL language from the perspective of the Visual Basic programmer. We will drive our understanding beyond the superficial level normally found in Visual Basic books. This approach serves the following two purposes:

The subject of this chapter is the SQL language. Therefore, the following topics will be addressed:

In each section you will create simple, but real applications to illustrate the language concepts.

Global View of SQL

SQL is a text-oriented, fourth-generation language. Fourth-generation languages are called nonprocedural languages because the statements tell the database management system what you want it to do, without telling it how to do it. The SQL syntax doesn't say, "Go to the first row of the table. Move that row to a buffer. Go to the second row. Move that row into the buffer also. Test if the third field of each row is greater than 17. If it is, delete it from the buffer. Get the next row... " The SQL statement to accomplish this will be SELECT * FROM Table1 WHERE id < 17;.

It is true that the computer cannot understand instructions at the SQL level. They must be translated to a level as low or lower than the instructions in the preceding paragraph. The important thing to understand is that it is the task of the database management system vendor—not the application programmer—to write that code. This greatly increases the speed at which applications can be developed, and also enlarges the pool of people that can develop these applications. Many intelligent people who can enjoy writing SQL statements who would not enjoy coding database calls.

This does not mean that SQL is a "wimpy" language that never gets complex. Complex queries are needed at times, but the majority of the SQL routines that you will write are fairly straightforward.

Understanding Set-Orientation

Set-oriented languages differ from traditional array-oriented languages in the way that they treat data. Traditional database management systems (such as IMS on the mainframe) assume that the receiver of the data is using the same computer that the database management system is using. These products are built to provide data to the application where the application logic can perform the task of selecting the data that is needed by the application. This strategy breaks down when the database management system is located on a different computer from the application. The strategy of culling out the unwanted data at the application computer can cause a performance problem over the network. For example, if your application wanted to retrieve the record for one of your company's 175,000 employees whose salary was over $30,000, the array-oriented query language would retrieve all 175,000 records, send them to your application, which would then throw away half of them (those employees with salaries under $30,000). This approach would choke a network for half an hour.

A set-oriented database management system is intelligent enough to perform all of the selections before sending the data that passes the test over the wire. If you send a request to SELECT * FROM employee_table WHERE employee_num = "G546897";, the SQL-based database management system is smart enough to pass you only the one row that satisfies the request. If the request reads "SELECT * FROM employee_table WHERE salary > 30,000;", then the database management system will cull out all rows in the table which fail this test, and pass the rows that pass the selection criteria over the network.

One of the differentiating skills of the SQL programmer is her ability to make SQL do as much of the work of selecting rows as possible. Always keep network traffic issues in mind when designing your SQL-based applications.

Types of SQL-Based Applications

Most SQL-based database management systems support a variety of ways of accessing their functionality. The primary types are as follows:

Fig. 10.1 The ISQL/W product in Microsoft SQL Server enables the user to access the data in a database interactively.

Listing 10.1 Embedding an SQL Statement into a Visual Basic Application

SQLQuery = "UPDATE Titles SET Title = 'My Right Foot'"
SQLQuery = SQLQuery & " WHERE Title = 'My Right Hand';"
MyDatabase.Execute SQLQuery ' Execute the query.

In this example, the object MyDatabase is a database. The Execute method executes the SQL string to the right. The SQLQuery string variable will contain the phrase "UPDATE Titles SET Title = 'My Right Foot' WHERE Title = 'My Right Hand';"

Listing 10.2 Callproc.bas Calling a Stored Procedure

Private Sub cmdProc_Click
' Demo calling stored procedure that returns
' a value via a select statement
Dim db As Database
Dim qry As QueryDef
Dim rsStoredProc As Recordset
Set db = DBEngine(0).OpenDatabase(App.Path & "\PROCS.MDB", False, False)
Set qry = db.CreateQueryDef("ImportantCustomer")
qry.SQL = "EXEC IMPORTANT_CUSTOMER 'ALWAO'"
qry.Connect = "ODBC;DATABASE=Northwind; _ DSN=NWIND;UID=Sa;PWD="
qry.ReturnsRecords = True
Set rsStoredProc = qryTemp.OpenRecordSet()
MsgBox "Returned value was :" & Str$(rsStoredProc(0))
qry.Close
rsStoredProc.Close
db.Close
end Sub

First, create a QueryDef called ImportantCustomer and set its SQL property to the Microsoft SQL Server specific command required to call the stored procedure EXEC IMPORTANT_CUSTOMER 'ALWAO'. Set its connect property to the values required to connect to the server. Indicate that it will return records by setting its ReturnsRecords property. For a more thorough explanation of stored procedures, refer to Chapter 9, "Client/Server Databases."

Creating a Database

Database creation is the most product-specific aspect of using an SQL-based database management system. For Visual Basic users, Data Manager can be used to create a Jet database. If you own Microsoft Access, you can create a database using that product, and then use Visual Basic to perform operations on it.

If you are using Microsoft SQL Server, you need to go into the "Microsoft SQL Administrator" application that ships with the product to create a database. Figure 10.2 illustrates how this is accomplished.

Fig. 10.2 The SQL Server Administrator is where you go to create a database

If you are using another product, refer to the documentation that ships with the software for specific instructions on how to create a database in that environment. The procedure varies widely from one database management system to another.

Creating Tables

As the proud owner of a new database, you are now ready to add a table to it. In the best textbook tradition, create an example database that manages the enrollment of students in courses. The tables and columns are listed in Table 10.1.

Table 10.1

Table 10.1 An Example Student Enrollment Database

Table Name Column Name Data Type Length Key
students_table student# INTEGER Yes
student_name CHARACTER 20
address CHARACTER 30
city CHARACTER 20
state CHARACTER 2
zip CHARACTER 5
sex CHARACTER 1
teachers_table teacher# INTEGER Yes
teacher_name CHARACTER 20
phone CHARACTER 10
salary FLOAT
courses_table course# INTEGER Yes
course_name CHARACTER 20
department CHARACTER 16
num_credits INTEGER
sections_table course# INTEGER Yes
section# INTEGER Yes
teacher# INTEGER
num_students INTEGER
enrolls_table course# INTEGER Yes
section# INTEGER Yes
student# INTEGER Yes
grade INTEGER

The SQL keywords in this example are all capitalized and the user defined words are all in small letters. This is done to help you distinguish which parts of the syntax are literal SQL and which parts are under your control.

These tables are designed to help illustrate the features of the language. While another example featuring a factory assembly line or a Wall Street brokerage might be more visually interesting than yet another student database, the concepts would be harder to comprehend.

The SQL code to create the tables is simple. Listing 10.3 shows the statements needed to create the students_table.

Listing 10.3 Statements Needed to Create students_table

CREATE TABLE students_table
(student# int,
student_name char(20),
address char(30),
city char(20),
state char(2),
zip char(5),
sex char(1) )

The CREATE TABLE keywords indicate, obviously, that this is a new table. The field names along with their data type and length follow. The above syntax is correct for Transact SQL. Other dialects may place a comma differently or place a semicolon at the end, but this syntax is fairly common.

It has become popular in recent years for database management system vendors to provide utilities to manage tables. These can be more convenient to use and free you from some syntax headaches. Be sure to know how to use SQL to perform these tasks also, because, if you live long enough, you will find yourself working in an environment where SQL is the only method available.

One practical application of the SQL CREATE TABLE method is during testing. Many database administrators create procedures that create and populate the tables using SQL. Each day, the testing process updates and deletes the data in the tables. The administrator will then delete the changed tables and recreate them using the script, thus restoring the tables to their original state, ready for another day of testing.

Listing 10.4 contains the code to create the rest of the tables in the example database.

Listing 10.4 Cretabl1.sql Creating SQL Tables

CREATE TABLE teachers_table
(teacher# int,
teacher_name char(20),
phone char(10),
salary float )
CREATE TABLE courses_table
(course# int,
course_name char(20),
department char(16),
num_credits int)
CREATE TABLE sections_table
(course# int,
section# int,
teacher# int,
num_students int)
CREATE TABLE enrolls_table
(course# int,
section# int,
student# int,
grade int)

Primary Keys

The primary key of a table is defined as the minimum set of fields that make a row unique. For example, the government administers Social Security by means of your Social Security Number. This is the unique key for an individual that assures that all of the payments made on your behalf are recorded as being from you. No other combination of attributes—name, hair color, place of birth, and so on—could guarantee uniqueness across a set of 100 million people.

In the same fashion, every row in a table must be unique from all other rows in the same table according to the rules set forth by Dr. Codd in his pivotal paper of 1970. The reason for this lies in the need for a row to be updated without error. If you get a raise, you want it to appear in your paycheck, not in your office mate's paycheck.

The mechanism for guaranteeing this in a relational database is the primary key. This field, or fields, when taken as a group must be unique. In some cases, as in the Social Security example, no natural field or set of fields is satisfactory, so an artificial key field, the Social Security Number, was created. Artificial key fields are nice in that it is easy to make them unique, but the chore of assigning them can prove burdensome.

SQL systems use a variety of techniques to designate a field or fields as the primary key of a table. Transact SQL uses the following syntax:

sp_primarykey students_table, student#

This syntax tells SQL Server that student# is the primary key. At a minimum, the database management system must enforce uniqueness for that row based on the primary key. This means that if a user attempts to add another row to the table and that row has the same primary key value as one already in the table, then the system will issue an error message. In addition to this, many systems use the primary key to order the row physically on the disk when they compress the database, but that is the vendor's prerogative.

Listing 10.5 Adding the Primary Keys to the Tables in the Student Database

sp_primarykey students_table, student#
sp_primarykey teachers_table, teacher#
sp_primarykey courses_table, course#
sp_primarykey sections_table, course#, section#
sp_primarykey enrolls_table, course#, section#, student#

The sp_primarykey keyword tells SQL Server that you want to designate the columns after the table name to be the primary key. Notice that on sections_table and enrolls_table the primary key is compound, meaning that it is made up of multiple columns in the row.

It is sometimes useful to query the server to be sure that the key is defined properly. The sp_helpkey command displays this information. The syntax of this command is as follows.

sp_helpkey enrolls_table
keytype object related_object object_keys related_keys
---------- ------------------------------ -----------------------
primary enrolls_table -- none -- course#, section#, student#,*

This report tells that enrolls_table has three keys: courses#, section#, and student#.

Adding Data to Tables

Understanding how tables are defined and how keys are designated prepares you for data input into the tables. This is done in SQL using the following syntax:

INSERT INTO students_table
( student#, student_name, address, city, state, zip, sex)
VALUES ( 593, 'Joe Smith', '123 4th St.', 'Auburn', 'AL', '32456', 'M')

The first set of parentheses enclose the names of the fields that you want to update. This list can be omitted if you want to update all of the columns in a row, and if you provide a data value for each of them. In that case, the database server will assign the first value to the first field in the table, the second value to the second field, and so on. The syntax for the previous example could be written as follows:

INSERT INTO students_table
VALUES ( 593, 'Joe Smith', '123 4th St.', 'Auburn', 'AL', '32456', 'M')

Table 10.2 lists all of the data that needs to be loaded into the example tables in order to provide data to the rest of the examples in this chapter.

Table 10.2

Table 10.2 Data for the students_table

No. Name Street City State Zip Sex
593 Joe Smith 123 4th St. Auburn AL 32456 M
594 Pomare Piti 1121 Tahiti St. Miami FL 12121 F
595 Ed Oken 1234 Rue de Vallee Hollywood CA 21212 M
596 Angel Loon P.O.Box 657 Amsterdam GA 31204 F
597 John Byron 4434 London Dr New York NY 34565 M
598 J. Roggeveen 876 Honden Rd Hoorne RI 34534 M
599 James Cook 7878 Astrolab Ct OLakes FL 34545 M
600 Joseph Banks 23 Botany Bay Brisbaine CA 87803 M
601 Addison Pratt 78043 Tubai La Large OR 43980 M
602 Eliza Pratt 78432 Pahi Rd Rockville MD 29796 F

After loading in all of these rows into the table, you can run the following command to inspect all of the data in the table.

SELECT * FROM students_table

This will give you a listing of the rows as shown in figure 10.3.

Fig. 10.3 Selecting the rows in a table enables you to inspect them for correctness.

The following tables give you the rest of the data for the other tables.

Table 10.3

Table 10.3 Data for the teachers_table

No. Name Phone Number Salary
101 Ralph Daniel 798-0989 80000
102 Sandy Turner 897-9089 80001
103 Lantie Hearn 546-0397 65000
104 John Baluch 494-9787 66000
105 Carl Kennedy 777-4981 110000

To put this data into the teachers table, use the following SQL statement once for each row in the table:

INSERT INTO teachers_table
VALUES (101, 'Ralph Daniel', '798-0989', 40000)

Table 10.4Table 10.4 Data for the courses_table

Course No Course Name Department Credits
1001 Underwater Basketweaving Recreation 7
1002 The History of Rap Music 9
1003 Country Music Theory Music 7
1004 Paris on 3500 Calories a Day PE 3
1005 Life after Baseball Strikes PE 7

To put this data into the teachers table, use the following SQL statement once for each row in the table:

INSERT INTO courses_table
VALUES (1001, 'Underwater Basketweaving', 'Recreation', 7)

Table 10.5

Table 10.5 Data for the sections_table

Course No. Section No. Teacher No. No. of Students
1001 1 105 23
1001 2 104 12
1002 1 101 45
1003 1 101 12
1004 1 102 9
1004 2 103 16
1004 3 105 77
1005 1 101 2

To put this data into the teachers table, use the following SQL statement once for each row in the table:

INSERT INTO sections_table
VALUES (1001,1,105,23)

Table 10.6

Table 10.6 Data for the enrolls_table

Course No. Section No. Student No. Grade
1001 1 593 90
1001 1 594 87
1001 1 595 77
1001 2 596 99
1001 2 597 66
1001 2 598 79
1001 2 599 81
1002 1 600 89
1002 1 601 87
1002 1 602 75
1002 1 603 92
1003 1 603 100
1003 1 595 76
1003 1 596 78
1004 1 593 98
1004 1 594 65
1004 1 595 22
1004 2 596 67
1004 2 597 87
1005 1 593 86
1005 1 603 87
1005 1 600 83
1005 1 601 79
1005 1 599 73
1005 1 598 72
1005 1 597 91
1005 1 596 87

To put this data into the teachers table, use the following SQL statement once for each row in the table:

INSERT INTO enrolls_table

VALUES (1005,1,596,87)

At this point, you have five tables in the database, each containing a number of rows to experiment with. The rest of the chapter will be spent learning how to use this data.

Changing the Schema of a Table

One of the inadequacies of the SQL standard is in the area of schema changes. Standard SQL provides no way to change the composition of a table once it is created. Given the nature of real world projects, it is unlikely that even half of the tables in a database are complete and never need to add or remove columns.

Almost every vendor has had to come up with a way of performing this function. Transact SQL, DB/2, Oracle, and others support the ALTER command to provide this functionality. The syntax of the ALTER command is as follows:

ALTER TABLE students_table ADD new_column char(8) NULL

The only operation that is allowed using the ALTER command is to add a column. The only columns that can be added are those that can contain nulls. The server creates space for the data in the table, but because it does not know what the values will be yet, it inserts nulls in every new field in every existing row.

Deleting a column is occasionally required in the real world to avoid a database filled with obsolete and unused data. There is no easy way to perform this function in SQL. The strategy most often employed is as follows:

1. Create a new table with the schema the way that you want it.

2. Copy the data from one table to the other using a query within an update.

3. Drop the original table after you are sure that the data transfer worked.

4. Finally, rename the new table to the same name as the old one.

The following example shows how this is done. The creation of the new table is a familiar process. Note that the new table name is created with the names of the columns exactly as they will appear in the finished table.

CREATE TABLE new_courses_table
(course# int,
new_course_name char(20),
num_credits int)

You now have two tables that contain fields about courses: courses_table, and new_courses_table. If you want a primary key assigned to the course# field in the new table, you must execute the sp_primarykey command.

sp_primarykey new_courses_table, course#

Now, the new table is ready for data. There are a number of ways of getting the data from the first table to the second. The first way that comes to mind for most programmers is to select one row at the time, and perform n updates for n rows. This approach has the following three disadvantages:

The preferred approach is to use a SELECT statement within the UPDATE command to choose the columns that you want to update.

INSERT INTO new_courses_table (course#,
new_course_name, num_credits)
SELECT course#, course_name, num_credits from courses_table

This syntax tells the database server to retrieve the rows and columns from the courses table that meet its specifications. This data is then handed to the INSERT statement on a column by column basis. The result is a copying of the data from one table to the other.

If you run a SELECT on the new table, you will see that it worked perfectly. All of the data that was needed in the new_courses_table has been moved into it.

SELECT * from new_courses_table
The result is listed here:
course# new_course_name num_credits
----------- -------------------- -----------
1001 Underwater Basketwea 7
1002 The History of Rap 9
1003 Country Music Theory 7
1004 Paris on 3500 Calori 3
1005 Life after Baseball 7
(5 row(s) affected)

Now you have two tables containing similar data. The courses_table has the old schema but the correct name, and the new_courses_table has the new schema and correct data but the old name.

The next step is to delete the original table from the database. This is done so that there will be no name conflict when you rename the new table to the old name.

DROP TABLE courses_table

Finally, you can rename the new_courses_table as courses_table without a name conflict.

sp_rename new_courses_table, courses_table

The sp_rename command is specific to Transact SQL. Other dialects provide their own methods for accomplishing this. SQL Server gives the following confirmation that the change was successful:

Object name has been changed.

In order to convince yourself that this convoluted procedure worked, SELECT all rows and all columns from the courses_table and examine the results. Figure 10.4 shows the results of this procedure. The courses_table now has one fewer column than before, and one of the columns has a new name.

select * from courses_table

Fig. 10.4 The courses_table now has a new schema that includes one fewer column and one renamed column.

The basis of this procedure was the ability to use the set processing features of SQL to avoid a lot of coding and network traffic. The stronger SQL programmers understand set theory and the set operations of the language and make effective use of them. The result is a simpler system that executes faster as well.

Changing the Data in the Tables

The most common tasks in a database application are updating and retrieving data. Commonplace systems like an order handling system provide screens to a sales person who uses them to enter data. The warehouse pulls the order up on a screen, fills it and marks it as complete via another set of screens. The billing department uses the same database to prepare an invoice to be mailed to the customer. Finally, the accounts receivable department enters the payment into the database and the transaction is complete.

The basic flow of this sample system and nearly every database application is the creation of a row in a table, which is then updated several times before it is complete. Systems that handle trouble calls from customers follow a simple pattern. An operator logs a call into a database. A dispatcher sends a crew to perform the repair and logs that fact to the database. The crew completes the repair and logs that also.

These systems rely on the capability of one set of workers to update a common database which will be used by co-workers to perform their functions. These systems rely on the UPDATE command in SQL to change the values of the fields in the database. The syntax of the UPDATE command is as follows:

UPDATE teachers_table
SET salary = 71000
WHERE salary < 71000

SELECT the data from this table before you execute this command. The values are as follows:

teacher# teacher_name phone salary
----------- -------------------- ---------- -------
101 Ralph Daniel 798-0989 80000.0
102 Sandy Turner 897-9089 80001.0
103 Lantie Hearn 546-0397 65000.0
104 John Baluch 494-9787 66000.0
105 Carl Kennedy 777-4981 110000.0
(5 row(s) affected)

Next, run the UPDATE above and observe that the values in the table are now changed, as shown here:

teacher# teacher_name phone salary
----------- -------------------- ---------- ---------
101 Ralph Daniel 798-0989 80000.0
102 Sandy Turner 897-9089 80001.0
105 Carl Kennedy 777-4981 110000.0
103 Lantie Hearn 546-0397 71000.0
104 John Baluch 494-9787 71000.0
(5 row(s) affected)

Notice again that all of the rows in the table were treated as a set by SQL. The WHERE clause tells which rows are to be affected by the command. The UPDATE verb told which columns were to receive what value. Contrast this with the following third-generation language or procedural approach:

1. Read a row.

2. Test the value of the salary field.

3. If the salary < 71000, update it to 71000.

4. Write the row back to the file.

Comparing the simplicity of the UPDATE command with the complexity of the procedural approach highlights the value of the set processing mentality.

The UPDATE command can handle calculations as well. Suppose that in the previous example you wanted to give everyone a 5% raise. This can be done directly in SQL using the following syntax:

UPDATE teachers_table
SET salary = (salary * 1.05)

The result of this action is the following:

teacher# teacher_name phone salary
----------- -------------------- ---------- ---------
101 Ralph Daniel 798-0989 84000.0
102 Sandy Turner 897-9089 84001.05
105 Carl Kennedy 777-4981 115500.0
103 Lantie Hearn 546-0397 74550.0
104 John Baluch 494-9787 74550.0
(5 row(s) affected)

The absence of a WHERE clause caused the UPDATE to be applied to every row in the table in the same way that omitting a where clause in a SELECT statement causes every row in the table to be retrieved.

Updates are not limited to single values either. Several fields can be updated at the same time.

UPDATE teachers_table
SET salary = (salary * 1.01),
teacher_name = 'Ralphie Daniel'
WHERE teacher# = 101
teacher# teacher_name phone salary
----------- -------------------- ---------- ---------
101 Ralphie Daniel 798-0989 84840.0

In this example, two fields in the same row were updated at the same time.

Complex UPDATE commands can be achieved by using a SELECT in the WHERE clause like you did above in the Insert example. In the following example, give the teachers whose salaries are below average a 2% raise. Don't give the ones that are at or above average any raise at all. If you think about how to do this using a procedural approach, it will look hard. Using the power of the set processing mentality, however, it becomes much simpler.

UPDATE teachers_table
SET salary = salary * 1.02
WHERE salary < (SELECT AVE(Salary)
FROM teachers_table)

The before picture of the table shows the old salaries.

teacher# teacher_name phone salary
----------- -------------------- ---------- --------
102 Sandy Turner 897-9089 84001.05
105 Carl Kennedy 777-4981 115500.0
103 Lantie Hearn 546-0397 74550.0
104 John Baluch 494-9787 74550.0
101 Ralphie Daniel 798-0989 84840.0
(5 row(s) affected)

After running the UPDATE, the salaries of four of the teachers have been increased according to the logic of the SQL statement.

teacher# teacher_name phone salary
----------- -------------------- ---------- --------
105 Carl Kennedy 777-4981 115500.0
102 Sandy Turner 897-9089 85681.071
103 Lantie Hearn 546-0397 76041.0
104 John Baluch 494-9787 76041.0
101 Ralphie Daniel 798-0989 86536.8
(5 row(s) affected)

In this example also, you saw how much more powerful the set processing approach is than the procedural approach. In fact, much of the system design of a production application involves finding a strategy to avoid moving any more data than necessary over the network.

Deleting Data and Tables

Deleting data from a table is very simple using SQL. If you want to delete the data in one field in one row, then the strategy would be to assign a 0, blank, or NULL value to that field.

UPDATE students_table
SET new_column = NULL
WHERE student# > 600

The state of the data in the table before the execution of the table has the string abc in the new_column.

student# student_name new_column
----------- -------------------- ----------
593 Joe Smith abc
594 Pomare Piti abc
595 Ed Oken abc
596 Angel Loon abc
597 John Byron abc
598 Jacob Roggeveen abc
599 James Cook abc
600 Joseph Banks abc
601 Addison Pratt abc
602 Eliza Pratt abc

After the execution of the UPDATE command, the values of the last two student rows are changed to NULL according to the logic of the SQL statement.

student# student_name new_column
----------- -------------------- ----------
593 Joe Smith abc
594 Pomare Piti abc
595 Ed Oken abc
596 Angel Loon abc
597 John Byron abc
598 Jacob Roggeveen abc
599 James Cook abc
600 Joseph Banks abc
601 Addison Pratt (null)
602 Eliza Pratt (null)

The null values were allowed in new_column because the keyword NULL was used in the ALTER command that added this column.

ALTER TABLE students_table ADD new_column char(8) NULL

The null value means that data is missing from the table, normally because it is unknown. Whenever a calculation like an average takes place on a column that has nulls, the null value is ignored in both the numerator and the denominator.

Often programmers are tempted to code applications where they store 0 in a numeric field whenever data is unknown. This can cause problems with procedures like the AVG() function. To illustrate this point, take the salaries of the teachers.

teacher# salary
----------- --------
102 85681.07
103 76041.0
104 76041.0
101 86536.8
105 0.0

Suppose that you know the salary of every teacher except teacher# 105. In real systems, perhaps that salary hasn't been determined yet. Normally you would put 0 in order to be able to add the teacher to the database. This works fine when people read it because they know that people don't work for free, and so they are correctly able to determine that this data is missing. If they were averaging the numbers by hand, then they would be smart enough to omit that value and calculate the average of the other values. The average that they would calculate would be $81.074.97. But, if they were content to average numbers by hand, they wouldn't be paying you to automate their business, would they?

Next, set the computer to use the AVG() function to calculate this average.
Select AVG(salary) from teachers_table

The answer that it gives is as follows (which is incorrect):

64859.9742

The computer literally averaged all of the fields in the database that it was instructed to average. Thus, the result was wrong.

The first thing to try would be to change the 0.0 to NULL. This is an illegal action because the salary field was defined as the default of NOT NULL when the table was created. SQL Server gives the following error message:

Msg 233, Level 16, State 1
The column salary in table teachers_table may not be null.

ALTER the table to add a new column that has the NULL attribute, and then copy the data from the salary column to this new column.

ALTER TABLE teachers_table ADD new_salary int NULL

The table now looks like the following:

teacher# teacher_name salary new_salary
----------- -------------------- ---------- -----------
102 Sandy Turner 85681.071 (null)
103 Lantie Hearn 76041.0 (null)
104 John Baluch 76041.0 (null)
101 Ralphie Daniel 86536.8 (null)
105 Carl Kennedy 0.0 (null)

Update the table to copy the salaries into the new_salary field.

UPDATE teachers_table
SET new_salary = salary
WHERE salary > 0

The result is the following table:

teacher# teacher_name salary new_salary
----------- -------------------- ---------- -----------
105 Carl Kennedy 0.0 (null)
102 Sandy Turner 85681.071 85681
103 Lantie Hearn 76041.0 76041
104 John Baluch 76041.0 76041
101 Ralphie Daniel 86536.8 86536
Now, use the AVG function again to see what happens.
Select AVG(new_salary) from teachers_tab

The result is a much more pleasing average that matches what a person would calculate when performing this task.

81074

Deleting a row from a table is very easy. It is accomplished by the following syntax:

DELETE FROM teachers_table

WHERE salary > 100000

The result is a "reengineered" workforce that is absent the highest-paid technical person.

teacher# teacher_name salary new_salary
----------- -------------------- -------------- -----------
105 Carl Kennedy 0.0 (null)
102 Sandy Turner 85681.071 85681
103 Lantie Hearn 76041.0 76041
104 John Baluch 76041.0 76041

You can see that the expected result was achieved. The offending employee has been dismissed from the teachers table.

Retrieving Data from Tables

By far, the most common operation on a database management system is a query. A query is a transaction whose goal is to retrieve data from the database. In SQL, the query verb is SELECT. SELECT statements follow the form:

SELECT <column names> FROM <table name>

An example of this would be:

SELECT course#, course_name FROM courses_table

The result of this query would be:

course# course_name
----------- --------------------
1001 Underwater Basketwea
1002 The History of Rap
1003 Country Music Theory
1004 Paris on 3500 Calori
1005 Life after Baseball

Another variation on this simple SELECT statement would be to retrieve all of the columns from all of the rows.

SELECT * FROM courses_table

The results would be as follows:

course# course_name department num_credits
----------- -------------------- ---------------- -----------
1001 Underwater Basketwea Recreation 7
1002 The History of Rap Music 9
1003 Country Music Theory Music 7
1004 Paris on 3500 Calori PE 3
1005 Life after Baseball PE 7

Be selective in your use of the wildcard (*) in production systems. If you create a report using the * and format it to look just right, adding a row to the schema of the table will make your report look bad. In embedded SQL procedures, it is best to avoid the use of the *, even if you want all of the columns. The * is very useful, however, for interactive sessions where you are testing the data or some SQL syntax.

The simple WHERE Clause

While you sometimes want to look at every row in a table, you often want only a subset of those rows. The SQL mechanism for telling the system which of these rows you want is the WHERE clause. The syntax of the WHERE clause is as follows:

SELECT <column names> FROM <table name> WHERE <predicate>

An example of this would be as follows:

SELECT course_name FROM courses_table WHERE course# = 1003

The result of this query is as follows:

course_name
--------------------
Country Music Theory

The predicate in this SELECT statement is course# = 1003. It is also legal to have a predicate that returns more than one row, as in the following example:

SELECT course#, course_name FROM courses_table WHERE course# > 1003

The result of this query is the following two rows:

course# course_name
----------- --------------------
1004 Paris on 3500 Calori
1005 Life after Baseball

Developing skill in the writing of predicates is what separates the good SQL programmer from the average one. The rest of this chapter concerns predicate writing.

The Compound WHERE Clause

It is often necessary to satisfy multiple conditions in one SELECT statement. If you wanted all students who lived in Florida you would use the following:

SELECT student#, student_name FROM students_table WHERE
state = 'FL'

The result is as follows:

student# student_name
----------- -------------
594 Pomare Piti
599 James Cook

If you wanted all male students, you would use the following:

SELECT student#, student_name FROM students_table WHERE sex = 'male'

The result is as follows:

student# student_name
----------- --------------------
593 Joe Smith
595 Ed Oken
597 John Byron
598 Jacob Roggeveen
599 James Cook
600 Joseph Banks
601 Addison Pratt

One strategy would be to retrieve all of the rows from one table into your application, then select the ones that meet the second criteria. Alternately, you could use a compound SELECT statement and let the database management system do the work for you before it sends the result over the network.

SELECT student#, student_name FROM students_table WHERE sex = 'M' AND state = 'FL'

The result is as follows:

student# student_name
----------- --------------------
599 James Cook

The efficiency of your program is greatly enhanced when you allow the database management system to manage the selection of the rows for you. Modern database servers have rule-based query optimizers that analyze a request and create a strategy of retrieval that it predicts would be the fastest to execute. In this case, it would discover that the state field part of the predicate would yield the fewest candidate rows, two. It would then have only two rows to evaluate on the sex = 'M' part of the predicate.

Another flavor of the compound predicate contains the keyword OR. It is used when there are several possible values that would be acceptable. Suppose that a student wanted a ride home for the summer. Because they live in Georgia, another student from either Georgia or Florida would do, as Florida students drive through Georgia on the way home. This query could be done by brute force or it could be done as follows:

SELECT student_name, state from students_table WHERE
state = 'FL' OR state = 'GA'

The result of this selection is as follows:

student_name state
-------------------- -----
Pomare Piti FL
Angel Loon GA
James Cook FL

Parentheses are sometimes used to clarify the request to the database management system. The following request is ambiguous:

SELECT student#, student_name FROM students_table
WHERE state = 'FL' or state = 'GA' AND sex = 'M'

Does this SQL statement want students from 'GA' who are male and both male and female student from 'FL', or does it want male students from either state? It is likely that different servers could parse this sentence differently and give different answers. If you rewrite the sentence as follows:

SELECT student_name, state, sex FROM students_table
WHERE (state = 'FL' OR state = 'GA') AND sex = 'M'

then the confusion dissipates; the meaning becomes obvious. The result is as follows:

student_name state sex
-------------------- ----- ---
James Cook FL M

The Not keyword is also part of the legal syntax of the SQL predicate. If you were willing to go anywhere for spring break except 'NY', then you could say:

SELECT student_name, state FROM students_table
WHERE NOT (state = 'NY')

The result would be as follows:

student_name state
-------------------- -----
Joe Smith AL
Pomare Piti FL
Ed Oken CA
Angel Loon GA
Jacob Roggeveen RI
James Cook FL
Joseph Banks CA
Addison Pratt OR
Eliza Pratt MD

Notice that all rows are present except the one student from the "Big Apple".

BETWEEN and NOT BETWEEN

It is common to want to select a set of rows with a value of a certain column falling within a certain range, as in the following:

SELECT student_name, zip FROM students_table
WHERE zip BETWEEN '30000' AND '40000'

The result is as follows:

student_name zip
-------------------- -----
Joe Smith 32456
Angel Loon 31204
John Byron 34565
Jacob Roggeveen 34534
James Cook 34545

By the same token, you might want all of the rows that are not in that range:

SELECT student_name, zip FROM students_table

WHERE zip NOT BETWEEN '30000' AND '40000'

The result is as follows:

student_name zip
-------------------- -----
Pomare Piti 12121
Ed Oken 21212
Joseph Banks 87803
Addison Pratt 43980
Eliza Pratt 29796

As expected, this query chose all of the rows with zip codes outside of that range.

LIKE and NOT LIKE

Many times you want to query data where you have only part of the criteria necessary to completely specify the predicate. This necessitates the use of an approximation in the WHERE clause. The LIKE operator gives us that capability.

SELECT student_name FROM students
WHERE student_name LIKE 'Jo%'

The result of this query is as follows:

student_name
--------------------
Joe Smith
John Byron
Joseph Banks

Another variation on the LIKE theme is the NOT LIKE keywords.

SELECT student_name FROM students
WHERE student_name NOT LIKE 'Jo%'

The result of this query is—as you might predict—as follows:

student_name
--------------------
Pomare Piti
Ed Oken
Angel Loon
Jacob Roggeveen
James Cook
Addison Pratt
Eliza Pratt

IN and NOT IN

Earlier, you saw how a WHERE clause could be used to find a student from 'GA' or 'FL'. The compound predicate worked well for two values but would become tedious if you needed to find students from nine or 10 states. To facilitate this type of query, a special keyword called IN is available. Instead of the following:

SELECT student_name, state from students_table WHERE state = 'FL' OR state = 'GA' OR state = 'AL' OR ...

You could use the IN keyword, as follows:

SELECT student_name, state from students_table WHERE state IN ( 'FL','GA','AL')

The result is the same as if you had strung together a set of OR's in the predicate.

student_name state
-------------------- -----
Joe Smith AL
Pomare Piti FL
Angel Loon GA
James Cook FL

As with the other predicate keywords, IN can be written NOT IN, as follows:

SELECT student_name, state from students_table WHERE state IN ( 'FL','GA','AL')

The result will be as follows:

student_name state
-------------------- -----
Ed Oken CA
John Byron NY
Jacob Roggeveen RI
Joseph Banks CA
Addison Pratt OR
Eliza Pratt MD

IS NULL and IS NOT NULL

If a field can contain nulls, there needs to be a way to check for the null value. Because the values actually representing NULL varies from system to system, a way of testing for NULL is required. SQL implements this through the IS NULL and the IS NOT NULL keyword phrases.

SELECT student_name, new_column FROM students_table
WHERE new_column IS NULL

The result of this query is as follows:

student_name new_column
-------------------- ----------
Addison Pratt (null)
Eliza Pratt (null)

The opposite effect can be achieved by using the following IS NOT NULL keyword phrase:

SELECT student_name, new_column FROM students_table
WHERE new_column IS NOT NULL

The result of this query is as follows:

student_name new_column
-------------------- ----------
Joe Smith abc
Pomare Piti abc
Ed Oken abc
Angel Loon abc
John Byron abc
Jacob Roggeveen abc
James Cook abc
Joseph Banks abc

ORDER BY

Relational database theory states that there is no order to the rows in a table. This means that all ordering of rows must be done at retrieval time. Because people can understand ordered data much easier than they can understand unordered data, it is a requirement in some applications to sort data. SQL provides a built-in syntax, ORDER BY, to specify how you want the results of the query sorted.

SELECT teacher_name, salary FROM teachers_table
ORDER BY salary

The result of this query is as follows:

teacher_name salary
-------------------- ----------
Carl Kennedy 0.0
John Baluch 76041.0
Lantie Hearn 76041.0
Sandy Turner 85681.071

In similar fashion, you can order rows in descending order.

SELECT teacher_name, salary FROM teachers_table
ORDER BY salary DESC

The results are now ordered in the opposite fashion.

teacher_name salary
-------------------- ------------------------
Sandy Turner 85681.071
John Baluch 76041.0
Lantie Hearn 76041.0
Carl Kennedy 0.0

Calculated Columns

There are times when you want to retrieve and display a calculation, without changing the data in the database. "What-if" logic is an example of this. What would happen to salaries if you gave everyone a raise of 4% instead of a raise of 5%?

SELECT teacher_name, salary, salary4% = salary * 1.04,
salary5% = salary * 1.05 FROM teachers_table
teacher_name salary salary4 salary5
-------------------- --------- ------------ ----------
Carl Kennedy 0.0 0.0 0.0
Sandy Turner 85681.07 89108.31 89965.12
Lantie Hearn 76041.0 79082.64 79843.05
John Baluch 76041.0 79082.64 79843.05

In addition to the normal arithmetic operators (+ - * /) there are a number of aggregate operations built into SQL. MIN(), MAX(), SUM(), AVE(), and COUNT() are all supported. In addition, specific database management systems provide other functions.

SELECT SUM(salary) FROM teachers_table

The result of this query is as follows:

----------
237763.071

Another interesting query is as follows:

SELECT COUNT(course#) FROM sections_table

The result of this query is as follows:

-----------
8

If you do a SELECT on those course# fields in sections_table:

SELECT course# FROM sections_table

You will find that the course# field contains a lot of the same courses:

course#
-----------
1001
1001
1002
1003
1004
1004
1004
1005

If the question being posed in the query is "How many different courses are being taught?", then the following query would be a better choice:

SELECT COUNT(DISTINCT course#) FROM sections_table

The result of this query might better serve your purposes.

-----------

5

Thus, there are five distinct course numbers being taught out of the eight sections.

Joining Tables

Relational database theory states that no redundant data should be stored in the database. In other words, you would not store the name of the student with her student# in the enrolls field. You already have a way of finding out the name of a student given the student# from the students_table. However, if you do a SELECT on the enrolls table to display grades, you will get the following:

course# section# student# grade
----------- ----------- ----------- -----------
1001 1 593 90
1001 1 593 90
1001 1 594 87
1001 1 595 77
1001 2 596 99
1001 2 597 66
1001 2 598 79
1001 2 599 81
1002 1 600 89
1002 1 601 87
1002 1 602 75
1002 1 603 92
1003 1 603 100
1003 1 595 76
1003 1 596 78
1004 1 593 98
1004 1 594 65
1004 1 595 22
1004 2 596 67
1004 2 597 87
1005 1 593 86
1005 1 603 87
1005 1 600 83
1005 1 601 79
1005 1 599 73
1005 1 598 72
1005 1 597 91
1005 1 596 87

This report is accurate enough, but the teacher would rather have the names of the students on the report also. That will require a joining of two tables to accomplish this. The following query shows how SQL uses the WHERE clause to effect a join.

SELECT course#, section#, student_name, grade
FROM students_table, enrolls_table
WHERE enrolls_table.student# = students_table.student#
The result of this query is as follows:
course# section# student_name grade
----------- ----------- -------------------- -----------
1001 1 Joe Smith 90
1001 1 Joe Smith 90
1004 1 Joe Smith 98
1005 1 Joe Smith 86
1001 1 Pomare Piti 87
1004 1 Pomare Piti 65
1001 1 Ed Oken 77
1003 1 Ed Oken 76
1004 1 Ed Oken 22
1001 2 Angel Loon 99
1003 1 Angel Loon 78
1004 2 Angel Loon 67
1005 1 Angel Loon 87
1001 2 John Byron 66
1004 2 John Byron 87
1005 1 John Byron 91
1001 2 Jacob Roggeveen 79
1005 1 Jacob Roggeveen 72
1001 2 James Cook 81
1005 1 James Cook 73
1002 1 Joseph Banks 89
1005 1 Joseph Banks 83
1002 1 Addison Pratt 87
1005 1 Addison Pratt 79
1002 1 Eliza Pratt 75

In order for a join to be successful, at least one field from each table must have the same domain. In the previous example, the student# fields in both the students_table and the enrolls_table both contained student numbers. This is by design. By performing the join, you were able to display the names of the fields next to the grades even though these two fields are in separate tables.

Joins can be combined with other where clauses in order to perform more complicated tasks. In the following example, only a subset of the rows are returned:

SELECT course#, section#, student_name, grade
FROM students_table, enrolls_table
WHERE enrolls_table.student# = students_table.student# and
grade > 85 and
state IN ('FL','GA','CA')

The result is as follows:

course# section# student_name state grade
----------- ----------- -------------------- ----- -----------
1001 1 Pomare Piti FL 87
1001 2 Angel Loon GA 99
1002 1 Joseph Banks CA 89
1005 1 Angel Loon GA 87

A join can also be created across three tables.

SELECT teacher_name, sections_table.teacher#,
sections_table.course#, course_name
FROM teachers_table, sections_table, courses_table
WHERE teachers_table.teacher# = sections_table.teacher# AND
sections_table.course# = courses_table.course#

The result of this query is as follows:

teacher_name teacher# course# course_name
-------------------- ----------- ----------- --------------------
Carl Kennedy 105 1001 Underwater Basketwea
Carl Kennedy 105 1004 Paris on 3500 Calori
Sandy Turner 102 1004 Paris on 3500 Calori
Lantie Hearn 103 1004 Paris on 3500 Calori
John Baluch 104 1001 Underwater Basketwea

In this rather complex example, there are two key elements. The first was the specification of which columns you want displayed. The fact that the sections_table columns were called out explicitly means that these rows will be the anchors and will determine which rows display. The second key point is the WHERE clause. Because you have three tables joined, the relationships between the three have to be specified. That is why two fields from the sections_table are in the where clause and only one field each from the other two tables.

Relational theory, if followed strictly, will sometimes cause five tables to be joined together.

SELECT teacher_name, course_name, student_name
FROM teachers_table, sections_table, courses_table,
enrolls_table, students_table
WHERE teachers_table.teacher# = sections_table.teacher# AND
sections_table.course# = courses_table.course# AND
sections_table.section# = enrolls_table.section# AND
sections_table.course# = enrolls_table.course# AND
enrolls_table.student# = students_table.student#
ORDER BY teacher_name
The result of this nightmare of a SQL sentence is as follows:
teacher_name course_name student_name
-------------------- -------------------- --------------------
Carl Kennedy Underwater Basketwea Ed Oken
Carl Kennedy Underwater Basketwea Joe Smith
Carl Kennedy Underwater Basketwea Joe Smith
Carl Kennedy Underwater Basketwea Pomare Piti
John Baluch Underwater Basketwea Angel Loon
John Baluch Underwater Basketwea Jacob Roggeveen
John Baluch Underwater Basketwea James Cook
John Baluch Underwater Basketwea John Byron
Lantie Hearn Paris on 3500 Calori Angel Loon
Lantie Hearn Paris on 3500 Calori John Byron
Sandy Turner Paris on 3500 Calori Ed Oken
Sandy Turner Paris on 3500 Calori Joe Smith
Sandy Turner Paris on 3500 Calori Pomare Piti

This join works fine, but would be very hard to maintain. In reality, joins of more than two tables perform poorly in production applications. While relational theory states that there can be no redundant data in a database, real world systems often contain replicated data. In fact, modern systems support replicated data by handling the copying of the data automatically.

From Here...

In this chapter you were introduced to the Structured Query Language (SQL). This language forms the user interface of most modern database management systems.

This chapter also examined how to use SQL to create modify and delete tables. You also learned how to add and delete data in the database. Finally, you learned how to retrieve the data in all sorts of clever ways.


| Previous Chapter | Next Chapter | Search | Table of Contents | Book Home Page |

| Buy This Book | Que Home Page | Digital Bookshelf | Disclaimer |


To order books from QUE, call us at 800-716-0044 or 317-361-5400.

For comments or technical support for our books and software, select Talk to Us.

© 1996, QUE Corporation, an imprint of Macmillan Publishing USA, a Simon and Schuster Company.