On Day 10, "Data Retrieval," you looked at the SELECT statement and performed basic queries on single tables. Today you look at more advanced topics, such as producing summary information with the aggregate functions GROUP BY, HAVING, COMPUTE, and COMPUTE BY. You also learn how to correlate data and perform subqueries. You finish with a look at more advanced queries, which involve selecting data from more than one table.
Aggregate functions can return summary values for an entire table or for groups of rows in a table. Aggregate functions are normally used in conjunction with the GROUP BY clause and are used in the HAVING clause or in the column_list. This may seem a little overwhelming at first, but hang in there. You are presented each piece in its basic form and then you learn how to add additional pieces one at a time. Table 11.1 lists the aggregate functions with their parameters and results.
Function | Results |
AVG([ALL | DISTINCT] column_name) | Average of the values in the numeric expression, either all or distinct |
COUNT(*) | Number of selected rows |
COUNT([ALL | DISTINCT] column_name) | Number of values in the expression, either all or distinct |
MAX(column_name) | Highest value in the expression |
MIN(column_name) | Lowest value in the expression |
SUM([ALL | DISTINCT] column_name) | Total of the values in the numeric expression, either all or distinct |
Here are a few examples using the aggregate functions in the pubs database.
SELECT COUNT(*) FROM employee ---------------- 43 (1 row(s) affected)
ANALYSIS: This query returned a count of the total number of rows in table employee.
SELECT MAX(ytd_sales) FROM titles -------------- 22246 (1 row(s) affected)
ANALYSIS: This query selected the maximum value found in the ytd_sales column from the titles table.
SELECT SUM(qty) FROM sales -------------- 493
ANALYSIS: This query selected all the values in the qty column of the sales table and added them.
The GROUP BY clause groups summary data that meets the WHERE clause criteria to be returned as single rows. The HAVING clause sets the criteria to determine which rows are returned by the GROUP BY clause. You could find out, for example, which books have more than one author and then return the book title and the authors for each book.
The syntax for the GROUP BY and HAVING:
SELECT column_list FROM table_list WHERE search_criteria [GROUP BY [ALL] non_aggregate_expression(s) [HAVING] search_criteria]
The HAVING clause has the same effect on the GROUP BY clause as the WHERE clause has on the SELECT statement. Here are some examples of using the GROUP BY and HAVING clauses:
SELECT title_id, count(title_id) AS Number_of_Authors FROM titleauthor GROUP BY title_id HAVING count(title_id) > 1 title_id Number_of_Authors ------ ------------------ BU1032 2 BU1111 2 MC3021 2 PC8888 2 PS1372 2 PS2091 2 TC7777 3
ANALYSIS: This query found all books that have more than one author and returned the title_id and number of authors for each book.
The GROUP BY ALL clause returns all groupings, including those not meeting the where clause criteria.
If you wanted to find all books with ytd sales of $4,000 or more and list all title IDs, you could submit the following query:
SELECT title_id, ytd_sales FROM titles WHERE (ytd_sales>=4000) GROUP BY ALL title_id, ytd_sales title_id ytd_sales -------- ---------- BU1032 4095 BU1111 3876 BU2075 18722 BU7832 4095 MC2222 2032 MC3021 22246 MC3026 (null) PC1035 8780 PC8888 4095 PC9999 (null) PS1372 375 PS2091 2045 PS2106 111 PS3333 4072 PS7777 3336 TC3218 375 TC4203 15096 TC7777 4095
ANALYSIS: Notice that all books that don't meet the WHERE clause criteria are still listed.
NOTE: The GROUP BY and HAVING clauses must meet certain requirements in order to comply with ANSI standards. One of these requirements is that the GROUP BY clause must contain all non-aggregate columns from the SELECT column_list. Another is that the HAVING clause criteria columns return only one value.
The COMPUTE and COMPUTE BY clauses are used to produce new rows of summary and detail data. They use the aggregate functions specified earlier. The COMPUTE clause returns detail rows and a grand total summary row. The COMPUTE BY clause returns new rows of summary data, much like the GROUP BY clause, but it returns the rows as subgroups with summary values.
The syntax for the COMPUTE and COMPUTE BY clauses is:
SELECT column_list FROM table_list WHERE search_criteria [COMPUTE] aggregate_expression(s) [BY] column_list]
Here are a couple of examples. You are going to compare the COMPUTE BY clause with the GROUP BY clause.
GROUP BY example:
SELECT type, SUM(ytd_sales) FROM titles GROUP BY type type ytd_sales ---- --------- business 30788 mod_cook 24278 popular_comp 12875 psychology 9939 trad_cook 15471 UNDECIDED NULL
If you use the COMPUTE BY clause, you must also include the ORDER BY clause. Now, if your ORDER BY clause is:
ORDER BY title_id, pub_id, au_id
then your COMPUTE BY clause can be one of the following:
COMPUTE aggregate_function (column_name) BY title_id, pub_id, au_id
or
COMPUTE aggregate_function (column_name) BY title_id, pub_id
or
COMPUTE aggregate_function (column_name) BY title_id
As you can see, the columns listed in the COMPUTE BY clause must be the same as, or a subset of the columns in the ORDER BY clause. The order of the columns in the COMPUTE BY clause must also be in the same order as those in the ORDER BY clause and you cannot skip columns. Enter the following and look at the results.
SELECT type, ytd_sales FROM titles ORDER BY type COMPUTE SUM(ytd_sales) BY type type ytd_sales ------------ ---------- business 4095 business 3876 business 18722 business 4095 sum ========= 30788 type ytd_sales ------------ ---------- mod_cook 2032 mod_cook 22246 sum ========= 24278 type ytd_sales ------------ ---------- popular_comp 8780 popular_comp 4095 popular_comp (null) sum ========= 12875 type ytd_sales --------- ---------- psychology 375 psychology 2045 psychology 111 psychology 4072 psychology 3336 sum ========= 9939 type ytd_sales --------- ---------- trad_cook 375 trad_cook 15096 trad_cook 4095 sum ========= 19566 type ytd_sales --------- ---------- UNDECIDED (null) sum ========= (null) (24 row(s) affected)
NOTE: The COMPUTE and COMPUTE BY clauses produce new rows of non-relational data. Therefore, they cannot be used with the SELECT INTO statement to create new tables.
You cannot use COMPUTE and COMPUTE BY with text or image datatypes because these are non-sortable.
To produce additional summary rows, referred to as super-aggregates, you can use the ROLLUP and CUBE operators. The ROLLUP and CUBE operators are new to SQL Server 6.5 and are used in conjunction with the GROUP BY clause.
The syntax for the ROLLUP and CUBE operators is:
SELECT column_list FROM table_list WHERE search_criteria [GROUP BY [ALL] non_aggregate_expression(s) [WITH {ROLLUP | CUBE}]]
The ROLLUP operator is typically used to produce running averages or running sums. This is done by applying the aggregate function in the SELECT column_list to each column in the GROUP BY clause, moving from left to right. What does that mean? This is most easily understood with an example.
SELECT type, pub_id, SUM(ytd_sales) AS ytd_sales FROM titles GROUP BY type, pub_id WITH ROLLUP type pub_id ytd_sales ----- ------ ---------- business 736 18722 business 1389 12066 business (null) 30788 mod_cook 877 24278 mod_cook (null) 24278 popular_comp 1389 12875 popular_comp (null) 12875 psychology 736 9564 psychology 877 375 psychology (null) 9939 trad_cook 877 19566 trad_cook (null) 19566 UNDECIDED 877 (null) UNDECIDED (null) (null) (null) (null) 97446 (15 row(s) affected)
ANALYSIS: The ROLLUP operator produced a row in your output for each row in the titles table with a single type and pub_id. It then showed you the ytd_sales for each item. It then produced an additional row for each type with summary information. In your example, the rows with (null) in the pub_id field display the SUM of all of the ytd_sales for that group of types.
Look at it in layman's terms: In the titles table, there are two rows that had both a business type and a unique pub_id. (In the real table, there are a total of four books of type business with one author, pub_id, writing three books and another author writing the fourth). Each author who wrote business books had ytd_sales of 18,722 and 12,066 respectively.
The ROLLUP operator then created a subtotal field that summed all of the Business-type books, (18,722 + 12,066) 30,788. The query then did the same thing for each group of book types and authors in the table and then gave a grand total (97,446), signified by a (Null) value in both the type and pub_id fields.
The CUBE operator produces super-aggregate rows by using every possible combination of the columns in the GROUP BY clause. Like the ROLLUP operator, the CUBE operator produces the running averages and running sums, but also cross-references columns to return additional summary rows. Here is an example:
SELECT type, pub_id, SUM(ytd_sales) AS ytd_sales FROM titles GROUP BY type, pub_id WITH CUBE type pub_id ytd_sales ------------ ------ ---------- business 736 18722 business 1389 12066 business (null) 30788 mod_cook 877 24278 mod_cook (null) 24278 popular_comp 1389 12875 popular_comp (null) 12875 psychology 736 9564 psychology 877 375 psychology (null) 9939 trad_cook 877 19566 trad_cook (null) 19566 UNDECIDED 877 (null) UNDECIDED (null) (null) (null) (null) 97446 (null) 736 28286 (null) 877 44219 (null) 1389 24941 (18 row(s) affected)
ANALYSIS: The CUBE operator produced a row in your output for each row in the titles table with a single type and pub_id. It then showed you the ytd_sales for each item. It then produced an additional row for each type with summary information and another row for each pub_id. In this example, the rows with (null) in the pub_id field display the SUM of all of the ytd_sales for that group of types; the rows with (null) in the type field display the SUM of all of the ytd_sales for that group of pub_ids.
When using the ROLLUP or CUBE, some restrictions are placed on the GROUP BY clause. You can have a maximum of 10 columns in the GROUP BY clause and the sum of the sizes of those columns cannot exceed 900 bytes in size and you cannot use the GROUP BY ALL clause.
NOTE: The ROLLUP and CUBE operators produce new rows of non-relational data. Therefore, you cannot use them with the SELECT INTO statement. Also, you cannot use ROLLUP and CUBE operators with text or image datatypes, as these are non-sortable.
In this section, you look at implementing joins to retrieve data from two or more tables. The results will appear as a single table with columns from all the tables specified in the SELECT column_list and meeting the search criteria. First you will look at how to implement joins using both ANSI and SQL Server syntax, and then the different types of joins: inner joins, cross joins, outer joins, and self joins.
To join tables, a comparison of one or more columns from a table to one or more columns in one or more tables needs to be made. The result of the comparison produces new rows by combining the columns in the SELECT column_list from the joined tables that meet the join conditions. When you join tables, you can use either ANSI or SQL Server join syntax. The syntax for both ANSI and SQL Server syntax are as follows:
ANSI join syntax:
SELECT table_name.column_name, [...] FROM {table_name [join_type] JOIN table_name ON search_criteria}, [...] WHERE search_criteria
The join statements for the ANSI syntax show up in the FROM clause of the SELECT statement. The WHERE clause selects rows from the joined rows to be returned. There are three types of ANSI join statements you can choose: INNER JOIN, OUTER JOIN, and CROSS JOIN.
SQL Server join syntax:
SELECT table_name.column_name, [...] FROM table_list WHERE table_name.column_name join_operator table_name.column_name, [...]
In SQL Server syntax, the FROM clause lists the tables involved in the join. The WHERE clause includes the columns to be joined and can include additional search criteria that determines the rows to be returned. The join operators for SQL Server syntax are: =, >, <, >=, <=, <>, !>, and !<.
Joins connect two tables based on a join condition producing results as a new table with the rows that satisfy the join condition. Inner joins produce information where matching information is found in both tables. The most common types of inner joins are equijoins and natural joins.
In an equijoin, column values are compared for equality and redundant columns are displayed as columns in the result set whereas in a natural join, the redundant columns are not displayed twice.
Take a look at the following example to clarify what was just said.
SQL Server syntax:
SELECT * FROM publishers, pub_info WHERE publishers.pub_id = pub_info.pub_id
ANSI syntax:
SELECT * FROM publishers INNER JOIN pub_info ON publishers.pub_id = pub_info.pub_id
Due to page width constraints, the following table is split into two parts. However, as output, this should be on one line.
pub_id pub_name city state country pub_id logo pr_info ---- -------- ------ ---- ---- ---- -------- -------- 736 New Moon Books Boston MA USA 736 NEWMOON.BMP New Moon Books... 877 Binnet & Hardley Washington DC USA 877 BINNET.BMP Binnet & Hardley... 1389 Algodata Infosystems Berkeley CA USA 1389 ALGODATA.BMP Algodata Infosystem 1622 Five Lakes Publishing Chicago IL USA 1622 5LAKES.BMP Five Lakes Publishing 1756 Ramona Publishers Dallas TX USA 1756 RAMONA.BMP Ramona Publishers 9901 GGG&G München NULL GER 9901 GGGG.BMP GGG&G... 9952 Scootney Books New York NY USA 9952 SCOOTNEY.BMP Scootney Books... 9999 Lucerne Publishing Paris NULL FRA 9999 LUCERNE.BMP Lucerne Publishing... (8 row(s) affected)
In this example the SELECT statement selects all columns from both the publishers and pub_info tables where the pub_id columns for the joined tables are equal. Note the redundant pub_id column.
In a natural join, column values are compared for equality, but redundant columns are eliminated from the columns in the result set.
Here is an example of a natural join.
SQL Server syntax:
SELECT publishers.*, pub_info.logo, pub_info.pr_info FROM publishers, pub_info WHERE publishers.pub_id = pub_info.pub_id
ANSI syntax:
SELECT publishers.*, pub_info.logo, pub_info.pr_info FROM publishers INNER JOIN pub_info ON publishers.pub_id = pub_info.pub_id pub_id pub_name city state country ... ---- -------- -------- ---- ---- ... 736 New Moon Books Boston MA USA ... 877 Binnet & Hardley Washington DC USA ... 1389 Algodata Infosystems Berkeley CA USA ... 1622 Five Lakes Publishing Chicago IL USA ... 1756 Ramona Publishers Dallas TX USA ... 9901 GGG&G München NULL GER ... 9952 Scootney Books New York NY USA ... 9999 Lucerne Publishing Paris NULL FRA ... (8 row(s) affected)
In this example, the SELECT statement selects all columns from the publishers table and all columns except pub_id from the pub_info table.
Cross or unrestricted joins return a combination of all rows of all tables in the join as the result set. A cross or unrestricted join is created not by using the WHERE clause in the SQL Server join of two or more tables, but by using the CROSS JOIN keyword for the ANSI join.
The combining of all rows from all tables involved in the join yields what is known as a Cartesian product. In most cases this type of result set is unusable, unless your intention was to find every possible combination, such as some type of statistical or mathematical analysis. To put it another way, if you look at each table as a matrix and then you multiply the matrices, you get a new matrix with all combinations as shown in Figure 11.1. Each row from Table1 is added to each row in Table2. If you add the number of columns from both tables, you get the resulting number of columns. If you multiply the number of rows in Table1 by the number of rows in Table2, you get the total number of rows returned by your query.
Figure 11.1. Creating a Cartesian product.
NOTE: Tables cannot be joined on text or image columns. You can, however, compare the lengths of text columns from two tables with a WHERE clause, but not actual data.
Here is an example of creating a Cartesian product using the CROSS JOIN or unrestricted join. Say you want to list all book titles and their authors' IDs, and you submit the following query:
SQL Server syntax:
SELECT titles.title, titleauthor.au_id FROM titles, titleauthor
ANSI syntax:
SELECT titles.title, titleauthor.au_id FROM titles CROSS JOIN titleauthor title au_id -------------- ---------- The Busy Executive's Database Guide 172-32-1176 The Busy Executive's Database Guide 213-46-8915 . . . . . . Sushi, Anyone? 998-72-3567 Sushi, Anyone? 998-72-3567 (450 row(s) affected)
The results of the query you submitted yielded 450 rows. There are 18 rows in the titles table and 25 rows in the titleauthor table. Since an unrestricted or cross join returns all possible combinations, you get 18x25 = 450 rows. Not quite what you wanted as your desired results, right?
To avoid submitting an unrestricted join, you should take the number of tables you are joining and subtract 1. N-1 is the number of join clauses needed, where N is the number of tables involved in the join (in other words, three tables, 3 - 1 = 2, two join clauses). The number of join clauses may be more if your are joining based on a composite key.
You can restrict rows from one table while allowing all rows from another table as your result set by using outer joins. One of the most common uses for this type of join is to search for orphan records. You can create an outer join statement using either SQL Server or ANSI syntax. The outer join operators and keywords are as follows:
SQL Server syntax:
*= | Includes all rows from the first table and only the matching rows in the second table (left outer join). |
=* | Includes all rows from the second table and only the matching rows in the first table (right outer join). |
ANSI syntax:
LEFT OUTER JOIN | Includes all rows from the first table and only the matching rows in the second table. |
RIGHT OUTER JOIN | Includes all rows from the second table and only the matching rows in the first table. |
FULL OUTER JOIN | Includes all non-matching rows from both tables. |
You have a table of customers and a table with orders. These two tables are related by a CustomerID field. With an equijoin or natural join, you would only return records where the CustomerID field has a match in both tables. Outer joins can be handy to get a Customer list and if a customer happens to have an order, that order information also shows up. If they do not have an order, then the information from the orders table shows up as (null).
If you did a left outer join on these tables and specified the Customers table first, then the desired results are returned. If you specify a right outer join, then your results would show all Orders and if an Order happens to have a CustomerID that does not match a CustomerID in the Customers table, then the Customer information would be (null). (If you follow the rules of referential integrity, you should never have an order without a valid CustomerID. If this is the case, then your right outer join would have the same results as an equijoin or natural join, all Orders and Customers where there is a match on CustomerID.)
The left and right outer joins can return the same results dependent upon the table order. For example, these two joins return the same information: Customers.CustomerID *= Orders.CustomerID and Orders.CustomerID =* Customers.CustomerID.
If you wanted to find all the titles and if they happen to have sold any copies, or the number of copies sold, you could submit the following query:
SQL Server syntax:
SELECT titles.title_id, titles.title, sales.qty FROM titles, sales WHERE titles.title_id *= sales.title_id
ANSI syntax:
SELECT titles.title_id, titles.title, sales.qty FROM titles LEFT OUTER JOIN sales ON titles.title_id = sales.title_id title_id title qty -------- -------------------------------------- --- BU1032 The Busy Executive's D... 5 BU1032 The Busy Executive's D... 10 BU1111 Cooking with Computers... 25 BU2075 You Can Combat Compute... 35 BU7832 Straight Talk About Co... 15 MC2222 Silicon Valley Gastron... 10 MC3021 The Gourmet Microwave 25 MC3021 The Gourmet Microwave 15 MC3026 The Psychology of Comp... (null) PC1035 But Is It User Friendl... 30 PC8888 Secrets of Silicon Val... 50 PC9999 Net Etiquette (null) PS1372 Computer Phobic AND No... 20 PS2091 Is Anger the Enemy? 3 PS2091 Is Anger the Enemy? 75 PS2091 Is Anger the Enemy? 10 PS2091 Is Anger the Enemy? 20 PS2106 Life Without Fear 25 PS3333 Prolonged Data Deprivat... 15 PS7777 Emotional Security: A ... 25 TC3218 Onions, Leeks, and Gar... 40 TC4203 Fifty Years in Bucking... 20 TC7777 Sushi, Anyone? 20 (23 row(s) affected)
When using an outer join with SQL Server syntax using a NULL comparison, SQL Server yields unexpected results. For all rows in the outer table that do not match rows in the inner table based upon the NULL comparison, NULL values are returned. The ANSI syntax yields the expected results.
SQL Server syntax:
SELECT title, stor_id, qty FROM titles, sales WHERE titles.title_id *= sales.title_id AND qty IS NULL title stor_id qty ------------------------------------------- ------- ---- The Busy Executive's Database Guide (null) (null) Cooking with Computers: Surreptitious ... (null) (null) You Can Combat Computer Stress! (null) (null) . . . . . . . . . Onions, Leeks, and Garlic: Cooking Secrets ... (null) (null) Fifty Years in Buckingham Palace Kitchens (null) (null) Sushi, Anyone? (null) (null) (18 row(s) affected)
ANSI syntax:
SELECT title, stor_id, qty FROM titles LEFT OUTER JOIN sales ON titles.title_id = sales.title_id WHERE qty IS NULL title stor_id qty ------------------------------ ------ ------ The Psychology of Computer Cooking (null) (null) Net Etiquette (null) (null) (2 row(s) affected)
As the name implies, a self join correlates rows of a table with other rows in the same table. Comparison queries for the same information are used most often for self joins. If you want to list all authors that live in the same city and zip code, for example, you will compare city and zip by executing the following query:
SQL Server syntax:
SELECT au1.au_fname, au1.au_lname, au2.au_fname, au2.au_lname, au1.city, au1.zip FROM authors au1, authors au2 WHERE au1.city = au2.city AND au1.zip = au2.zip AND au1.au_id < au2.au_id ORDER BY au1.city, au1.zip
ANSI syntax:
SELECT au1.au_fname, au1.au_lname, au2.au_fname, au2.au_lname, au1.city, au1.zip FROM authors au1 INNER JOIN authors au2 ON au1.city = au2.city AND au1.zip = au2.zip WHERE au1.au_id < au2.au_id ORDER BY au1.city, au1.zip au_fname au_lname au_fname au_lname city zip -------- -------- -------- -------- ------------ ------ Cheryl Carson Abraham Bennet Berkeley 94705 Dean Straight Dirk Stringer Oakland 94609 Dean Straight Livia Karsen Oakland 94609 Dirk Stringer Livia Karsen Oakland 94609 Ann Dull Sheryl Hunter Palo Alto 94301 Anne Ringer Albert Ringer Salt Lake City 84152 (6 row(s) affected)
ANALYSIS: Notice that when you do a self join on a table, you create an alias for the table name. This is done so that one table is treated logically as two tables.
TIP: A table alias is useful anytime you do a multi-table join operation. It enables you to create a more readable and shorter query statement, because you reference the table alias instead of the table name.
If you're still having problems with all this syntax, you're in luck. Included with the SQL Server 6.5 utilities is MS Query. MS Query is a tool that enables you to build your queries visually as shown in Day 3, "SQL Server Tools and Utilities."
Once you have built your query visually, you can cut and paste the SQL syntax generated by MS Query into ISQL/w. By pressing the SQL button in the MS Query utility, you can see the statements in Access-SQL, which is the version of SQL that Microsoft Access uses.
You must remove the quotation marks from the query syntax generated by MS Query. Otherwise you receive a syntax error when you try to run it from within ISQL/w or the Query tool in the Enterprise Manager.
You can also use table aliases as shown on Day 10, "Data Retrieval," to make the SQL statement shorter. The following example is pasted from the MS Query SQL syntax and then modified using table aliases.
SELECT a.au_lname, a.au_fname, t.title, t.price FROM authors a, titleauthor ta, titles t WHERE ta.au_id = a.au_id AND t.title_id = ta.title_id
A SELECT statement nested inside another SELECT statement is commonly referred to as a subquery. Subqueries can produce the same results as a join operation. In this section, you will look at how the subquery is used, the types of subqueries, subquery restrictions, and correlated subqueries.
A SELECT statement can be nested within another SELECT, INSERT, UPDATE, or DELETE statement. If the subquery returns a single value, such as an aggregate, it can be used anywhere a single value can be used. If the subquery returns a list, such as a single column of many values, it can only be used in the WHERE clause.
In many cases a join operation can be used instead of a subquery, but some instances can only be processed as a subquery. In some cases a join operation can yield better performance than a subquery, but generally there is little performance difference.
The subquery is always enclosed within parentheses and it completes before the outer query is processed, unless there is a correlated subquery. A subquery can contain another subquery and that subquery can contain a subquery and so on. There is no practical limit to the number of subqueries that can be processed other than system resources.
The syntax for a nested SELECT statement:
(SELECT [ALL | DISTINCT] subquery_column_list [FROM table_list] [WHERE clause] [GROUP BY clause] [HAVING clause])
A subquery can be used to return a single column or single value anywhere a single value expression can be used, and a subquery can be compared against using the following operators: =, <, >, <=, >=, <>, !>, and !<. It can also return a single column or many values that can be used with the IN list comparison operator in the WHERE clause. A subquery can also return many rows that are used for an existence check by using the EXISTS keyword in the WHERE clause.
To find all the authors that live in the same state as the bookstore that sells their publisher's books, for example, you could run either of the following queries.
SELECT DISTINCT au_fname, au_lname, state FROM authors WHERE state IN (SELECT state FROM stores)
or
SELECT DISTINCT au_fname, au_lname, state FROM authors WHERE EXISTS (SELECT * FROM stores WHERE state = authors.state) au_fname au_lname state --------- -------- ----- Abraham Bennet CA Akiko Yokomoto CA Ann Dull CA Burt Gringlesby CA Charlene Locksley CA Cheryl Carson CA Dean Straight CA Dirk Stringer CA Heather McBadden CA Johnson White CA Livia Karsen CA Marjorie Green CA Michael O'Leary CA Reginald Blotchet-Halls OR Sheryl Hunter CA Stearns MacFeather CA (16 row(s) affected)
Restrictions exist on what you can do with subqueries. This list provides the rules by which you can create and use a subquery.
A correlated subquery references a table from the outer query and evaluates each row for the outer query. In this aspect a correlated subquery differs from a normal subquery, because the subquery depends on values from the outer query. A normal subquery is executed independently of the outer query.
In the following example, the join query is rewritten as a correlated subquery. The queries return the same information and answer the question: who are the authors that live in the same city and zip code?
SQL Server JOIN syntax:
SELECT au1.au_fname, au1.au_lname, au2.au_fname, au2.au_lname, au1.city, au1.zip FROM authors au1, authors au2 WHERE au1.city = au2.city AND au1.zip = au2.zip AND au1.au_id < au2.au_id ORDER BY au1.city, au1.zip au_fname au_lname au_fname au_lname city zip ------ ------ ------ ------ ---------- ------ Cheryl Carson Abraham Bennet Berkeley 94705 Dean Straight Dirk Stringer Oakland 94609 Dean Straight Livia Karsen Oakland 94609 Dirk Stringer Livia Karsen Oakland 94609 Ann Dull Sheryl Hunter Palo Alto 94301 Anne Ringer Albert Ringer Salt Lake City 84152 (6 row(s) affected)
SQL Server Correlated Subquery syntax:
SELECT au1.au_fname, au1.au_lname, au1.city, au1.zip FROM authors au1 WHERE zip IN (SELECT zip FROM authors au2 WHERE au1.city = au2.city AND au1.au_id <> au2.au_id) ORDER BY au1.city, au1.zip au_fname au_lname city zip -------- ------------ -------------- -------- Abraham Bennet Berkeley 94705 Cheryl Carson Berkeley 94705 Livia Karsen Oakland 94609 Dirk Stringer Oakland 94609 Dean Straight Oakland 94609 Sheryl Hunter Palo Alto 94301 Ann Dull Palo Alto 94301 Albert Ringer Salt Lake City 84152 Anne Ringer Salt Lake City 84152 (9 row(s) affected)
Notice that the same data is returned; it is simply formatted differently and is more readable.
The SELECT INTO statement enables you to create a new table based on query results. The new table is based on the columns you specify in the select list, the table(s) you name in the FROM clause, and the rows you choose in the WHERE clause. You can create two types of tables with a SELECT INTO statement: permanent and temporary. The syntax for the SELECT INTO is as follows:
SELECT column_list INTO new_table_name FROM table_list WHERE search_criteria
When creating a permanent table, the SELECT INTO/BULK COPY database option must be set. The SELECT INTO statement enables you to define a table and put data into it without going through the usual data definition process. The name of the new table must be unique within the database and must conform to the rules for SQL Server naming conventions. If columns in the column_list of your SELECT statement have no titles, such as derived columns like aggregate functions, the columns in the new table will have no names; however, there are two problems with this.
You can also use the SELECT INTO statement to create temporary tables, of which there are two types: local and global. A local temporary table is available only during the current user session to SQL Server and is deallocated when the session is terminated. A local temporary table is created by preceding the new table name with the # symbol. A global temporary table is available to all user sessions to SQL Server and is deallocated when the last user session accessing the table is terminated. A global temporary table is created by preceding the new table name with two ## symbols. These temporary tables reside in the tempdb database.
SELECT title_id, title, price INTO #tmpTitles FROM titles GO SELECT * FROM #tmpTitles GO (18 row(s) affected) title_id Title price -------- ---------------------------------- -------- BU1032 The Busy Executive's Database Guide 19.99 BU1111 Cooking with Computers: Surreptitious... 11.95 BU2075 You Can Combat Computer Stress! 2.99 . . . . . . . . . TC3218 Onions, Leeks, and Garlic: Cooking Secrets... 20.95 TC4203 Fifty Years in Buckingham Palace Kitchens 11.95 TC7777 Sushi, Anyone? 14.99 (18 row(s) affected)
TIP: The purpose of the SELECT INTO statement is to create a new table. If you want to add rows to a preexisting table, use the INSERT statement or INSERT INTO.
You can combine the results of two or more queries into a single result set by using the UNION operator. By default, duplicate rows are eliminated; however, using UNION with the ALL keyword returns all rows, including duplicates.
SELECT column_list [INTO clause] [FROM clause] [WHERE clause] [GROUP BY clause] [HAVING clause] [UNION [ALL] SELECT column_list [FROM clause] [WHERE clause] [GROUP BY clause] [HAVING clause]...] [ORDER BY clause] [COMPUTE clause]
Here are the rules for using the UNION operator:
SELECT title, stor_name, ord_date, qty FROM titles, sales, stores WHERE titles.title_id = sales.title_id AND stores.stor_id = sales.stor_id UNION SELECT title, `No Sales', NULL, NULL FROM titles WHERE title_id NOT IN (SELECT title_id FROM sales) ORDER BY qty title stor_name ord_date qty -------------- ---------- -------- --- Net Etiquette No Sales (null) (null) The Psychology... No Sales (null) (null) Is Anger the Enemy? Eric the Read Books Sep 13 1994... 3 . . . . . . . . . . . . Onions, Leeks, and Garlic:... News & Brews Jun 15 19... 40 Secrets of Silicon Valley Barnum's May 24 1993... 50 Is Anger the Enemy? Barnum's Sep 13 1994... 75 (23 row(s) affected)
In this chapter you learned about aggregate functions like SUM and AVG and how to use them in a SELECT statement. Remember that aggregate functions can return summary values for an entire table or for groups of rows in a table. They are also normally used in conjunction with the GROUP BY or HAVING clauses, or in the column_list.
GROUP BY clauses group summary data that meets the WHERE clause criteria, to be returned as single rows. The HAVING clauses can also be used to set the criteria to be returned by the GROUP BY clause.
You also learned about the COMPUTE and COMPUTE BY clauses that can be used to produce new rows of summary and detail data. The COMPUTE clause is useful as it returns detail rows and a grand total summary row. The COMPUTE BY clause returns rows as subgroups with summary values.
In order to produce additional summary rows, which are often called super-aggregates, you can use the ROLLUP and CUBE operators in conjunction with the GROUP BY clause.
The ROLLUP operator is typically used to produce running averages or running sums. The CUBE operator produces super-aggregate rows by using every possible combination of the columns in the GROUP BY clause.
You also learned how to join tables using both the ANSI syntax as well as the SQL Server syntax. The join statements for the ANSI syntax show up in the FROM clause of the SELECT statement, whereas the WHERE clause is used for the SQL Server syntax. There are three types of ANSI join statements: INNER JOIN, OUTER JOIN and CROSS JOIN. Inner Joins are either equijoins or natural joins and return information where values in two tables are identical. Outer joins can show you all information from one table and, if the associated table has values that match, they will be displayed as well. Cross or unrestricted joins return a combination of all rows of all tables in the join as the result set.
You also learned about creating subqueries. This occurs when you nest a SELECT statement inside another SELECT statement. Subqueries can produce the same results as a join operation.
The SELECT INTO statement enables you to create a new table based on query results. You can create either permanent tables or temporary tables. When you create a permanent table, you must have the SELECT INTO/Bulkcopy option set to true for the database. To create temporary tables, you prefix the table name with a # for a local temporary table, or a ## symbol for a global temporary table.
Finally, you learned how to combine the results of two or more queries into a single result set by using the UNION operator.
SELECT SUM(qty) FROM sales
The Workshop provides quiz questions to help you solidify your understanding of the material covered, as well as exercises to provide you with experience in using what you've learned. Try to understand the quiz and exercise answers before continuing on to the next day's lesson. Answers are provided in Appendix B.
© Copyright, Macmillan Computer Publishing. All rights reserved.