Visual Basic Expert SolutionsChapter 10Structured Query LanguageBy Steve Potts |
The Structured Query Language, SQL (pronounced SEE-kwal), is the industry standard database manipulation language. This language enables you to manipulate almost every major database management system in widespread use at this writing.
For the Visual Basic programmer, SQL is important because it is the primary means of communication between Visual Basic and both the Jet database engine and the Microsoft SQL Server database engine that runs under Windows NT Server. While it is possible to create some applications with Visual Basic and the Data Access Object, substantial applications will require that you learn SQL.
SQL traces its genealogy back to the "olden" days of relational database theory. In 1970, Dr. E.F. Codd, then an IBM employee, noticed a theoretical similarity between relational mathematics and the problem of manipulating a database. He wrote a famous paper explaining these similarities, and thus invented the relational database. The first language that Dr. Codd and his assistants at the IBM San Jose laboratory created to implement this theory was a primitive form of SQL. While it differs from modern SQL in important ways, it was the seed which later caused the SQL language to be popularized.
Unlike Visual Basic itself, SQL is a published American National Standards Institute (ANSI) standard (X.3.135-1992). Most of the SQL implementations in use today follow this standard fairly closely. However, because the standard doesn't address 100 percent of the issues needed for a complete implementation of a database management system, the different dialects of SQL differ slightly. While these slight differences complicate the porting of your applications from one database management system (DBMS) to another, they do not become an issue often during the learning of the SQL syntax and implementation strategy. Most SQL dialects contain some useful features that are missing from the ANSI standard. For example, Microsoft's SQL Server product uses a dialect of SQL called Transact SQL. This language contains many more reserved words than the SQL standard calls out. It even supports conditional execution and loops within the SQL statement. Access SQL supports the Transform and Pivot statements that process crosstab queries. These features are useful, but they should be avoided if you plan to port your application to another database management platform that supports its own dialect of SQL.
The goal of SQL is to provide a functional interface between the user and the functionality of the database management system. This is an important concept. The inner workings of a modern database management system are extremely complex. Pointers, multiple processes, indexing schemes, and buffers abound. The goal of SQL is to provide the user with a simple way of manipulating the database, while allowing the database management system development team the freedom necessary to create the complex system needed for good performance.
This chapter approaches the SQL language from the perspective of the Visual Basic programmer. We will drive our understanding beyond the superficial level normally found in Visual Basic books. This approach serves the following two purposes:
The subject of this chapter is the SQL language. Therefore, the following topics will be addressed:
In each section you will create simple, but real applications to illustrate the language concepts.
SQL is a text-oriented, fourth-generation language. Fourth-generation languages are called nonprocedural languages because the statements tell the database management system what you want it to do, without telling it how to do it. The SQL syntax doesn't say, "Go to the first row of the table. Move that row to a buffer. Go to the second row. Move that row into the buffer also. Test if the third field of each row is greater than 17. If it is, delete it from the buffer. Get the next row... " The SQL statement to accomplish this will be SELECT * FROM Table1 WHERE id < 17;.
It is true that the computer cannot understand instructions at the SQL level. They must be translated to a level as low or lower than the instructions in the preceding paragraph. The important thing to understand is that it is the task of the database management system vendor—not the application programmer—to write that code. This greatly increases the speed at which applications can be developed, and also enlarges the pool of people that can develop these applications. Many intelligent people who can enjoy writing SQL statements who would not enjoy coding database calls.
This does not mean that SQL is a "wimpy" language that never gets complex. Complex queries are needed at times, but the majority of the SQL routines that you will write are fairly straightforward.
Set-oriented languages differ from traditional array-oriented languages in the way that they treat data. Traditional database management systems (such as IMS on the mainframe) assume that the receiver of the data is using the same computer that the database management system is using. These products are built to provide data to the application where the application logic can perform the task of selecting the data that is needed by the application. This strategy breaks down when the database management system is located on a different computer from the application. The strategy of culling out the unwanted data at the application computer can cause a performance problem over the network. For example, if your application wanted to retrieve the record for one of your company's 175,000 employees whose salary was over $30,000, the array-oriented query language would retrieve all 175,000 records, send them to your application, which would then throw away half of them (those employees with salaries under $30,000). This approach would choke a network for half an hour.
A set-oriented database management system is intelligent enough to perform all of the selections before sending the data that passes the test over the wire. If you send a request to SELECT * FROM employee_table WHERE employee_num = "G546897";, the SQL-based database management system is smart enough to pass you only the one row that satisfies the request. If the request reads "SELECT * FROM employee_table WHERE salary > 30,000;", then the database management system will cull out all rows in the table which fail this test, and pass the rows that pass the selection criteria over the network.
One of the differentiating skills of the SQL programmer is her ability to make SQL do as much of the work of selecting rows as possible. Always keep network traffic issues in mind when designing your SQL-based applications.
Most SQL-based database management systems support a variety of ways of accessing their functionality. The primary types are as follows:
Listing 10.1 Embedding an SQL Statement into a Visual Basic Application
In this example, the object MyDatabase is a database. The Execute method executes the SQL string to the right. The SQLQuery string variable will contain the phrase "UPDATE Titles SET Title = 'My Right Foot' WHERE Title = 'My Right Hand';"
Listing 10.2 Callproc.bas Calling a Stored Procedure
First, create a QueryDef called ImportantCustomer and set its SQL property to the Microsoft SQL Server specific command required to call the stored procedure EXEC IMPORTANT_CUSTOMER 'ALWAO'. Set its connect property to the values required to connect to the server. Indicate that it will return records by setting its ReturnsRecords property. For a more thorough explanation of stored procedures, refer to Chapter 9, "Client/Server Databases."
Database creation is the most product-specific aspect of using an SQL-based database management system. For Visual Basic users, Data Manager can be used to create a Jet database. If you own Microsoft Access, you can create a database using that product, and then use Visual Basic to perform operations on it.
If you are using Microsoft SQL Server, you need to go into the "Microsoft SQL Administrator" application that ships with the product to create a database. Figure 10.2 illustrates how this is accomplished.
Fig. 10.2 The SQL Server Administrator is where you go to create a database
If you are using another product, refer to the documentation that ships with the software for specific instructions on how to create a database in that environment. The procedure varies widely from one database management system to another.
As the proud owner of a new database, you are now ready to add a table to it. In the best textbook tradition, create an example database that manages the enrollment of students in courses. The tables and columns are listed in Table 10.1.
Table 10.1
Table 10.1 An Example Student Enrollment Database
Table Name | Column Name | Data Type | Length | Key |
students_table | student# | INTEGER | Yes | |
student_name | CHARACTER | 20 | ||
address | CHARACTER | 30 | ||
city | CHARACTER | 20 | ||
state | CHARACTER | 2 | ||
zip | CHARACTER | 5 | ||
sex | CHARACTER | 1 | ||
teachers_table | teacher# | INTEGER | Yes | |
teacher_name | CHARACTER | 20 | ||
phone | CHARACTER | 10 | ||
salary | FLOAT | |||
courses_table | course# | INTEGER | Yes | |
course_name | CHARACTER | 20 | ||
department | CHARACTER | 16 | ||
num_credits | INTEGER | |||
sections_table | course# | INTEGER | Yes | |
section# | INTEGER | Yes | ||
teacher# | INTEGER | |||
num_students | INTEGER | |||
enrolls_table | course# | INTEGER | Yes | |
section# | INTEGER | Yes | ||
student# | INTEGER | Yes | ||
grade | INTEGER |
The SQL keywords in this example are all capitalized and the user defined words are all in small letters. This is done to help you distinguish which parts of the syntax are literal SQL and which parts are under your control.
These tables are designed to help illustrate the features of the language. While another example featuring a factory assembly line or a Wall Street brokerage might be more visually interesting than yet another student database, the concepts would be harder to comprehend.
The SQL code to create the tables is simple. Listing 10.3 shows the statements needed to create the students_table.
Listing 10.3 Statements Needed to Create students_table
The CREATE TABLE keywords indicate, obviously, that this is a new table. The field names along with their data type and length follow. The above syntax is correct for Transact SQL. Other dialects may place a comma differently or place a semicolon at the end, but this syntax is fairly common.
It has become popular in recent years for database management system vendors to provide utilities to manage tables. These can be more convenient to use and free you from some syntax headaches. Be sure to know how to use SQL to perform these tasks also, because, if you live long enough, you will find yourself working in an environment where SQL is the only method available.
One practical application of the SQL CREATE TABLE method is during testing. Many database administrators create procedures that create and populate the tables using SQL. Each day, the testing process updates and deletes the data in the tables. The administrator will then delete the changed tables and recreate them using the script, thus restoring the tables to their original state, ready for another day of testing.
Listing 10.4 contains the code to create the rest of the tables in the example database.
Listing 10.4 Cretabl1.sql Creating SQL Tables
The primary key of a table is defined as the minimum set of fields that make a row unique. For example, the government administers Social Security by means of your Social Security Number. This is the unique key for an individual that assures that all of the payments made on your behalf are recorded as being from you. No other combination of attributes—name, hair color, place of birth, and so on—could guarantee uniqueness across a set of 100 million people.
In the same fashion, every row in a table must be unique from all other rows in the same table according to the rules set forth by Dr. Codd in his pivotal paper of 1970. The reason for this lies in the need for a row to be updated without error. If you get a raise, you want it to appear in your paycheck, not in your office mate's paycheck.
The mechanism for guaranteeing this in a relational database is the primary key. This field, or fields, when taken as a group must be unique. In some cases, as in the Social Security example, no natural field or set of fields is satisfactory, so an artificial key field, the Social Security Number, was created. Artificial key fields are nice in that it is easy to make them unique, but the chore of assigning them can prove burdensome.
SQL systems use a variety of techniques to designate a field or fields as the primary key of a table. Transact SQL uses the following syntax:
This syntax tells SQL Server that student# is the primary key. At a minimum, the database management system must enforce uniqueness for that row based on the primary key. This means that if a user attempts to add another row to the table and that row has the same primary key value as one already in the table, then the system will issue an error message. In addition to this, many systems use the primary key to order the row physically on the disk when they compress the database, but that is the vendor's prerogative.
Listing 10.5 Adding the Primary Keys to the Tables in the Student Database
The sp_primarykey keyword tells SQL Server that you want to designate the columns after the table name to be the primary key. Notice that on sections_table and enrolls_table the primary key is compound, meaning that it is made up of multiple columns in the row.
It is sometimes useful to query the server to be sure that the key is defined properly. The sp_helpkey command displays this information. The syntax of this command is as follows.
This report tells that enrolls_table has three keys: courses#, section#, and student#.
Understanding how tables are defined and how keys are designated prepares you for data input into the tables. This is done in SQL using the following syntax:
The first set of parentheses enclose the names of the fields that you want to update. This list can be omitted if you want to update all of the columns in a row, and if you provide a data value for each of them. In that case, the database server will assign the first value to the first field in the table, the second value to the second field, and so on. The syntax for the previous example could be written as follows:
Table 10.2 lists all of the data that needs to be loaded into the example tables in order to provide data to the rest of the examples in this chapter.
Table 10.2
Table 10.2 Data for the students_table
No. | Name | Street | City | State | Zip | Sex |
593 | Joe Smith | 123 4th St. | Auburn | AL | 32456 | M |
594 | Pomare Piti | 1121 Tahiti St. | Miami | FL | 12121 | F |
595 | Ed Oken | 1234 Rue de Vallee | Hollywood | CA | 21212 | M |
596 | Angel Loon | P.O.Box 657 | Amsterdam | GA | 31204 | F |
597 | John Byron | 4434 London Dr | New York | NY | 34565 | M |
598 | J. Roggeveen | 876 Honden Rd | Hoorne | RI | 34534 | M |
599 | James Cook | 7878 Astrolab Ct | OLakes | FL | 34545 | M |
600 | Joseph Banks | 23 Botany Bay | Brisbaine | CA | 87803 | M |
601 | Addison Pratt | 78043 Tubai La | Large | OR | 43980 | M |
602 | Eliza Pratt | 78432 Pahi Rd | Rockville | MD | 29796 | F |
After loading in all of these rows into the table, you can run the following command to inspect all of the data in the table.
SELECT * FROM students_table
This will give you a listing of the rows as shown in figure 10.3.
Fig. 10.3 Selecting the rows in a table enables you to inspect them for correctness.
The following tables give you the rest of the data for the other tables.
Table 10.3
Table 10.3 Data for the teachers_table
No. | Name | Phone Number | Salary |
101 | Ralph Daniel | 798-0989 | 80000 |
102 | Sandy Turner | 897-9089 | 80001 |
103 | Lantie Hearn | 546-0397 | 65000 |
104 | John Baluch | 494-9787 | 66000 |
105 | Carl Kennedy | 777-4981 | 110000 |
To put this data into the teachers table, use the following SQL statement once for each row in the table:
Table 10.4Table 10.4 Data for the courses_table
Course No | Course Name | Department | Credits |
1001 | Underwater Basketweaving | Recreation | 7 |
1002 | The History of Rap | Music | 9 |
1003 | Country Music Theory | Music | 7 |
1004 | Paris on 3500 Calories a Day | PE | 3 |
1005 | Life after Baseball Strikes | PE | 7 |
To put this data into the teachers table, use the following SQL statement once for each row in the table:
Table 10.5
Table 10.5 Data for the sections_table
Course No. | Section No. | Teacher No. | No. of Students |
1001 | 1 | 105 | 23 |
1001 | 2 | 104 | 12 |
1002 | 1 | 101 | 45 |
1003 | 1 | 101 | 12 |
1004 | 1 | 102 | 9 |
1004 | 2 | 103 | 16 |
1004 | 3 | 105 | 77 |
1005 | 1 | 101 | 2 |
To put this data into the teachers table, use the following SQL statement once for each row in the table:
Table 10.6
Table 10.6 Data for the enrolls_table
Course No. | Section No. | Student No. | Grade |
1001 | 1 | 593 | 90 |
1001 | 1 | 594 | 87 |
1001 | 1 | 595 | 77 |
1001 | 2 | 596 | 99 |
1001 | 2 | 597 | 66 |
1001 | 2 | 598 | 79 |
1001 | 2 | 599 | 81 |
1002 | 1 | 600 | 89 |
1002 | 1 | 601 | 87 |
1002 | 1 | 602 | 75 |
1002 | 1 | 603 | 92 |
1003 | 1 | 603 | 100 |
1003 | 1 | 595 | 76 |
1003 | 1 | 596 | 78 |
1004 | 1 | 593 | 98 |
1004 | 1 | 594 | 65 |
1004 | 1 | 595 | 22 |
1004 | 2 | 596 | 67 |
1004 | 2 | 597 | 87 |
1005 | 1 | 593 | 86 |
1005 | 1 | 603 | 87 |
1005 | 1 | 600 | 83 |
1005 | 1 | 601 | 79 |
1005 | 1 | 599 | 73 |
1005 | 1 | 598 | 72 |
1005 | 1 | 597 | 91 |
1005 | 1 | 596 | 87 |
To put this data into the teachers table, use the following SQL statement once for each row in the table:
INSERT INTO enrolls_table
VALUES (1005,1,596,87)
At this point, you have five tables in the database, each containing a number of rows to experiment with. The rest of the chapter will be spent learning how to use this data.
One of the inadequacies of the SQL standard is in the area of schema changes. Standard SQL provides no way to change the composition of a table once it is created. Given the nature of real world projects, it is unlikely that even half of the tables in a database are complete and never need to add or remove columns.
Almost every vendor has had to come up with a way of performing this function. Transact SQL, DB/2, Oracle, and others support the ALTER command to provide this functionality. The syntax of the ALTER command is as follows:
The only operation that is allowed using the ALTER command is to add a column. The only columns that can be added are those that can contain nulls. The server creates space for the data in the table, but because it does not know what the values will be yet, it inserts nulls in every new field in every existing row.
Deleting a column is occasionally required in the real world to avoid a database filled with obsolete and unused data. There is no easy way to perform this function in SQL. The strategy most often employed is as follows:
1. Create a new table with the schema the way that you want it.
2. Copy the data from one table to the other using a query within an update.
3. Drop the original table after you are sure that the data transfer worked.
4. Finally, rename the new table to the same name as the old one.
The following example shows how this is done. The creation of the new table is a familiar process. Note that the new table name is created with the names of the columns exactly as they will appear in the finished table.
You now have two tables that contain fields about courses: courses_table, and new_courses_table. If you want a primary key assigned to the course# field in the new table, you must execute the sp_primarykey command.
Now, the new table is ready for data. There are a number of ways of getting the data from the first table to the second. The first way that comes to mind for most programmers is to select one row at the time, and perform n updates for n rows. This approach has the following three disadvantages:
The preferred approach is to use a SELECT statement within the UPDATE command to choose the columns that you want to update.
This syntax tells the database server to retrieve the rows and columns from the courses table that meet its specifications. This data is then handed to the INSERT statement on a column by column basis. The result is a copying of the data from one table to the other.
If you run a SELECT on the new table, you will see that it worked perfectly. All of the data that was needed in the new_courses_table has been moved into it.
Now you have two tables containing similar data. The courses_table has the old schema but the correct name, and the new_courses_table has the new schema and correct data but the old name.
The next step is to delete the original table from the database. This is done so that there will be no name conflict when you rename the new table to the old name.
Finally, you can rename the new_courses_table as courses_table without a name conflict.
The sp_rename command is specific to Transact SQL. Other dialects provide their own methods for accomplishing this. SQL Server gives the following confirmation that the change was successful:
Object name has been changed.
In order to convince yourself that this convoluted procedure worked, SELECT all rows and all columns from the courses_table and examine the results. Figure 10.4 shows the results of this procedure. The courses_table now has one fewer column than before, and one of the columns has a new name.
select * from courses_table
The basis of this procedure was the ability to use the set processing features of SQL to avoid a lot of coding and network traffic. The stronger SQL programmers understand set theory and the set operations of the language and make effective use of them. The result is a simpler system that executes faster as well.
The most common tasks in a database application are updating and retrieving data. Commonplace systems like an order handling system provide screens to a sales person who uses them to enter data. The warehouse pulls the order up on a screen, fills it and marks it as complete via another set of screens. The billing department uses the same database to prepare an invoice to be mailed to the customer. Finally, the accounts receivable department enters the payment into the database and the transaction is complete.
The basic flow of this sample system and nearly every database application is the creation of a row in a table, which is then updated several times before it is complete. Systems that handle trouble calls from customers follow a simple pattern. An operator logs a call into a database. A dispatcher sends a crew to perform the repair and logs that fact to the database. The crew completes the repair and logs that also.
These systems rely on the capability of one set of workers to update a common database which will be used by co-workers to perform their functions. These systems rely on the UPDATE command in SQL to change the values of the fields in the database. The syntax of the UPDATE command is as follows:
SELECT the data from this table before you execute this command. The values are as follows:
Next, run the UPDATE above and observe that the values in the table are now changed, as shown here:
Notice again that all of the rows in the table were treated as a set by SQL. The WHERE clause tells which rows are to be affected by the command. The UPDATE verb told which columns were to receive what value. Contrast this with the following third-generation language or procedural approach:
1. Read a row.
2. Test the value of the salary field.
3. If the salary < 71000, update it to 71000.
4. Write the row back to the file.
Comparing the simplicity of the UPDATE command with the complexity of the procedural approach highlights the value of the set processing mentality.
The UPDATE command can handle calculations as well. Suppose that in the previous example you wanted to give everyone a 5% raise. This can be done directly in SQL using the following syntax:
The result of this action is the following:
The absence of a WHERE clause caused the UPDATE to be applied to every row in the table in the same way that omitting a where clause in a SELECT statement causes every row in the table to be retrieved.
Updates are not limited to single values either. Several fields can be updated at the same time.
In this example, two fields in the same row were updated at the same time.
Complex UPDATE commands can be achieved by using a SELECT in the WHERE clause like you did above in the Insert example. In the following example, give the teachers whose salaries are below average a 2% raise. Don't give the ones that are at or above average any raise at all. If you think about how to do this using a procedural approach, it will look hard. Using the power of the set processing mentality, however, it becomes much simpler.
The before picture of the table shows the old salaries.
After running the UPDATE, the salaries of four of the teachers have been increased according to the logic of the SQL statement.
In this example also, you saw how much more powerful the set processing approach is than the procedural approach. In fact, much of the system design of a production application involves finding a strategy to avoid moving any more data than necessary over the network.
Deleting data from a table is very simple using SQL. If you want to delete the data in one field in one row, then the strategy would be to assign a 0, blank, or NULL value to that field.
The state of the data in the table before the execution of the table has the string abc in the new_column.
After the execution of the UPDATE command, the values of the last two student rows are changed to NULL according to the logic of the SQL statement.
The null values were allowed in new_column because the keyword NULL was used in the ALTER command that added this column.
The null value means that data is missing from the table, normally because it is unknown. Whenever a calculation like an average takes place on a column that has nulls, the null value is ignored in both the numerator and the denominator.
Often programmers are tempted to code applications where they store 0 in a numeric field whenever data is unknown. This can cause problems with procedures like the AVG() function. To illustrate this point, take the salaries of the teachers.
Suppose that you know the salary of every teacher except teacher# 105. In real systems, perhaps that salary hasn't been determined yet. Normally you would put 0 in order to be able to add the teacher to the database. This works fine when people read it because they know that people don't work for free, and so they are correctly able to determine that this data is missing. If they were averaging the numbers by hand, then they would be smart enough to omit that value and calculate the average of the other values. The average that they would calculate would be $81.074.97. But, if they were content to average numbers by hand, they wouldn't be paying you to automate their business, would they?
The answer that it gives is as follows (which is incorrect):
The computer literally averaged all of the fields in the database that it was instructed to average. Thus, the result was wrong.
The first thing to try would be to change the 0.0 to NULL. This is an illegal action because the salary field was defined as the default of NOT NULL when the table was created. SQL Server gives the following error message:
ALTER the table to add a new column that has the NULL attribute, and then copy the data from the salary column to this new column.
The table now looks like the following:
Update the table to copy the salaries into the new_salary field.
The result is the following table:
The result is a much more pleasing average that matches what a person would calculate when performing this task.
81074
Deleting a row from a table is very easy. It is accomplished by the following syntax:
DELETE FROM teachers_table
WHERE salary > 100000
The result is a "reengineered" workforce that is absent the highest-paid technical person.
You can see that the expected result was achieved. The offending employee has been dismissed from the teachers table.
By far, the most common operation on a database management system is a query. A query is a transaction whose goal is to retrieve data from the database. In SQL, the query verb is SELECT. SELECT statements follow the form:
An example of this would be:
The result of this query would be:
Another variation on this simple SELECT statement would be to retrieve all of the columns from all of the rows.
The results would be as follows:
Be selective in your use of the wildcard (*) in production systems. If you create a report using the * and format it to look just right, adding a row to the schema of the table will make your report look bad. In embedded SQL procedures, it is best to avoid the use of the *, even if you want all of the columns. The * is very useful, however, for interactive sessions where you are testing the data or some SQL syntax.
While you sometimes want to look at every row in a table, you often want only a subset of those rows. The SQL mechanism for telling the system which of these rows you want is the WHERE clause. The syntax of the WHERE clause is as follows:
An example of this would be as follows:
The result of this query is as follows:
The predicate in this SELECT statement is course# = 1003. It is also legal to have a predicate that returns more than one row, as in the following example:
The result of this query is the following two rows:
Developing skill in the writing of predicates is what separates the good SQL programmer from the average one. The rest of this chapter concerns predicate writing.
It is often necessary to satisfy multiple conditions in one SELECT statement. If you wanted all students who lived in Florida you would use the following:
The result is as follows:
If you wanted all male students, you would use the following:
The result is as follows:
One strategy would be to retrieve all of the rows from one table into your application, then select the ones that meet the second criteria. Alternately, you could use a compound SELECT statement and let the database management system do the work for you before it sends the result over the network.
The result is as follows:
The efficiency of your program is greatly enhanced when you allow the database management system to manage the selection of the rows for you. Modern database servers have rule-based query optimizers that analyze a request and create a strategy of retrieval that it predicts would be the fastest to execute. In this case, it would discover that the state field part of the predicate would yield the fewest candidate rows, two. It would then have only two rows to evaluate on the sex = 'M' part of the predicate.
Another flavor of the compound predicate contains the keyword OR. It is used when there are several possible values that would be acceptable. Suppose that a student wanted a ride home for the summer. Because they live in Georgia, another student from either Georgia or Florida would do, as Florida students drive through Georgia on the way home. This query could be done by brute force or it could be done as follows:
The result of this selection is as follows:
Parentheses are sometimes used to clarify the request to the database management system. The following request is ambiguous:
Does this SQL statement want students from 'GA' who are male and both male and female student from 'FL', or does it want male students from either state? It is likely that different servers could parse this sentence differently and give different answers. If you rewrite the sentence as follows:
then the confusion dissipates; the meaning becomes obvious. The result is as follows:
The Not keyword is also part of the legal syntax of the SQL predicate. If you were willing to go anywhere for spring break except 'NY', then you could say:
The result would be as follows:
Notice that all rows are present except the one student from the "Big Apple".
It is common to want to select a set of rows with a value of a certain column falling within a certain range, as in the following:
The result is as follows:
By the same token, you might want all of the rows that are not in that range:
SELECT student_name, zip FROM students_table
WHERE zip NOT BETWEEN '30000' AND '40000'
The result is as follows:
As expected, this query chose all of the rows with zip codes outside of that range.
Many times you want to query data where you have only part of the criteria necessary to completely specify the predicate. This necessitates the use of an approximation in the WHERE clause. The LIKE operator gives us that capability.
The result of this query is as follows:
Another variation on the LIKE theme is the NOT LIKE keywords.
The result of this query is—as you might predict—as follows:
Earlier, you saw how a WHERE clause could be used to find a student from 'GA' or 'FL'. The compound predicate worked well for two values but would become tedious if you needed to find students from nine or 10 states. To facilitate this type of query, a special keyword called IN is available. Instead of the following:
SELECT student_name, state from students_table WHERE state = 'FL' OR state = 'GA' OR state = 'AL' OR ...
You could use the IN keyword, as follows:
SELECT student_name, state from students_table WHERE state IN ( 'FL','GA','AL')
The result is the same as if you had strung together a set of OR's in the predicate.
As with the other predicate keywords, IN can be written NOT IN, as follows:
SELECT student_name, state from students_table WHERE state IN ( 'FL','GA','AL')
The result will be as follows:
If a field can contain nulls, there needs to be a way to check for the null value. Because the values actually representing NULL varies from system to system, a way of testing for NULL is required. SQL implements this through the IS NULL and the IS NOT NULL keyword phrases.
The result of this query is as follows:
The opposite effect can be achieved by using the following IS NOT NULL keyword phrase:
The result of this query is as follows:
Relational database theory states that there is no order to the rows in a table. This means that all ordering of rows must be done at retrieval time. Because people can understand ordered data much easier than they can understand unordered data, it is a requirement in some applications to sort data. SQL provides a built-in syntax, ORDER BY, to specify how you want the results of the query sorted.
The result of this query is as follows:
In similar fashion, you can order rows in descending order.
The results are now ordered in the opposite fashion.
There are times when you want to retrieve and display a calculation, without changing the data in the database. "What-if" logic is an example of this. What would happen to salaries if you gave everyone a raise of 4% instead of a raise of 5%?
In addition to the normal arithmetic operators (+ - * /) there are a number of aggregate operations built into SQL. MIN(), MAX(), SUM(), AVE(), and COUNT() are all supported. In addition, specific database management systems provide other functions.
The result of this query is as follows:
Another interesting query is as follows:
SELECT COUNT(course#) FROM sections_table
The result of this query is as follows:
If you do a SELECT on those course# fields in sections_table:
SELECT course# FROM sections_table
You will find that the course# field contains a lot of the same courses:
If the question being posed in the query is "How many different courses are being taught?", then the following query would be a better choice:
SELECT COUNT(DISTINCT course#) FROM sections_table
The result of this query might better serve your purposes.
-----------
5
Thus, there are five distinct course numbers being taught out of the eight sections.
Relational database theory states that no redundant data should be stored in the database. In other words, you would not store the name of the student with her student# in the enrolls field. You already have a way of finding out the name of a student given the student# from the students_table. However, if you do a SELECT on the enrolls table to display grades, you will get the following:
This report is accurate enough, but the teacher would rather have the names of the students on the report also. That will require a joining of two tables to accomplish this. The following query shows how SQL uses the WHERE clause to effect a join.
In order for a join to be successful, at least one field from each table must have the same domain. In the previous example, the student# fields in both the students_table and the enrolls_table both contained student numbers. This is by design. By performing the join, you were able to display the names of the fields next to the grades even though these two fields are in separate tables.
Joins can be combined with other where clauses in order to perform more complicated tasks. In the following example, only a subset of the rows are returned:
The result is as follows:
A join can also be created across three tables.
The result of this query is as follows:
In this rather complex example, there are two key elements. The first was the specification of which columns you want displayed. The fact that the sections_table columns were called out explicitly means that these rows will be the anchors and will determine which rows display. The second key point is the WHERE clause. Because you have three tables joined, the relationships between the three have to be specified. That is why two fields from the sections_table are in the where clause and only one field each from the other two tables.
Relational theory, if followed strictly, will sometimes cause five tables to be joined together.
This join works fine, but would be very hard to maintain. In reality, joins of more than two tables perform poorly in production applications. While relational theory states that there can be no redundant data in a database, real world systems often contain replicated data. In fact, modern systems support replicated data by handling the copying of the data automatically.
In this chapter you were introduced to the Structured Query Language (SQL). This language forms the user interface of most modern database management systems.
This chapter also examined how to use SQL to create modify and delete tables. You also learned how to add and delete data in the database. Finally, you learned how to retrieve the data in all sorts of clever ways.
| Previous Chapter | Next Chapter | Search | Table of Contents | Book Home Page |
| Buy This Book | Que Home Page | Digital Bookshelf | Disclaimer |
To order books from QUE, call us at 800-716-0044 or 317-361-5400.
For comments or technical support for our books and software, select Talk to Us.
© 1996, QUE Corporation, an imprint of Macmillan Publishing USA, a Simon and Schuster Company.