Teach Yourself Microsoft SQL Server 6.5 in 21 Days

Previous chapterNext chapterContents


Day 11

Data Retrieval (Advanced Topics)


On Day 10, "Data Retrieval," you looked at the SELECT statement and performed basic queries on single tables. Today you look at more advanced topics, such as producing summary information with the aggregate functions GROUP BY, HAVING, COMPUTE, and COMPUTE BY. You also learn how to correlate data and perform subqueries. You finish with a look at more advanced queries, which involve selecting data from more than one table.

Aggregate Functions

Aggregate functions can return summary values for an entire table or for groups of rows in a table. Aggregate functions are normally used in conjunction with the GROUP BY clause and are used in the HAVING clause or in the column_list. This may seem a little overwhelming at first, but hang in there. You are presented each piece in its basic form and then you learn how to add additional pieces one at a time. Table 11.1 lists the aggregate functions with their parameters and results.

Table 11.1. Aggregate functions.

Function Results
AVG([ALL  |  DISTINCT] column_name) Average of the values in the numeric expression, either all or distinct
COUNT(*) Number of selected rows
COUNT([ALL  |  DISTINCT] column_name) Number of values in the expression, either all or distinct
MAX(column_name) Highest value in the expression
MIN(column_name) Lowest value in the expression
SUM([ALL  |  DISTINCT] column_name) Total of the values in the numeric expression, either all or distinct

Here are a few examples using the aggregate functions in the pubs database.

SELECT COUNT(*)
FROM employee
----------------
      43
(1 row(s) affected) 

ANALYSIS: This query returned a count of the total number of rows in table employee.

SELECT MAX(ytd_sales)
FROM titles
--------------
22246
(1 row(s) affected)

ANALYSIS: This query selected the maximum value found in the ytd_sales column from the titles table.

SELECT SUM(qty)
FROM sales
--------------
493

ANALYSIS: This query selected all the values in the qty column of the sales table and added them.

GROUP BY and HAVING

The GROUP BY clause groups summary data that meets the WHERE clause criteria to be returned as single rows. The HAVING clause sets the criteria to determine which rows are returned by the GROUP BY clause. You could find out, for example, which books have more than one author and then return the book title and the authors for each book.

The syntax for the GROUP BY and HAVING:

SELECT column_list
FROM table_list
WHERE search_criteria
[GROUP BY [ALL] non_aggregate_expression(s)
[HAVING] search_criteria]

The HAVING clause has the same effect on the GROUP BY clause as the WHERE clause has on the SELECT statement. Here are some examples of using the GROUP BY and HAVING clauses:

SELECT title_id, count(title_id) AS Number_of_Authors
FROM titleauthor
GROUP BY title_id
HAVING count(title_id) > 1
title_id    Number_of_Authors
------      ------------------
BU1032      2
BU1111      2
MC3021      2
PC8888      2
PS1372      2
PS2091      2
TC7777      3

ANALYSIS: This query found all books that have more than one author and returned the title_id and number of authors for each book.

The GROUP BY ALL clause returns all groupings, including those not meeting the where clause criteria.

If you wanted to find all books with ytd sales of $4,000 or more and list all title IDs, you could submit the following query:

SELECT title_id, ytd_sales
FROM titles
WHERE (ytd_sales>=4000)
GROUP BY ALL title_id, ytd_sales
title_id     ytd_sales
--------     ----------
BU1032       4095
BU1111       3876
BU2075       18722
BU7832       4095
MC2222       2032
MC3021       22246
MC3026       (null)
PC1035       8780
PC8888       4095
PC9999       (null)
PS1372       375
PS2091       2045
PS2106       111
PS3333       4072
PS7777       3336
TC3218       375
TC4203       15096
TC7777       4095

ANALYSIS: Notice that all books that don't meet the WHERE clause criteria are still listed.


NOTE: The GROUP BY and HAVING clauses must meet certain requirements in order to comply with ANSI standards. One of these requirements is that the GROUP BY clause must contain all non-aggregate columns from the SELECT column_list. Another is that the HAVING clause criteria columns return only one value.

COMPUTE and COMPUTE BY

The COMPUTE and COMPUTE BY clauses are used to produce new rows of summary and detail data. They use the aggregate functions specified earlier. The COMPUTE clause returns detail rows and a grand total summary row. The COMPUTE BY clause returns new rows of summary data, much like the GROUP BY clause, but it returns the rows as subgroups with summary values.

The syntax for the COMPUTE and COMPUTE BY clauses is:

SELECT column_list
FROM table_list
WHERE search_criteria
[COMPUTE] aggregate_expression(s)
[BY] column_list]

Here are a couple of examples. You are going to compare the COMPUTE BY clause with the GROUP BY clause.

GROUP BY example:

SELECT type, SUM(ytd_sales)
FROM titles
GROUP BY type
type    	ytd_sales
----	---------
business    	30788
mod_cook    	24278
popular_comp   	12875
psychology    	9939
trad_cook    	15471
UNDECIDED    	NULL

If you use the COMPUTE BY clause, you must also include the ORDER BY clause. Now, if your ORDER BY clause is:

ORDER BY title_id, pub_id, au_id

then your COMPUTE BY clause can be one of the following:

COMPUTE aggregate_function (column_name) 
BY title_id, pub_id, au_id

or

COMPUTE aggregate_function (column_name) 
BY title_id, pub_id

or

COMPUTE aggregate_function (column_name) 
BY title_id

As you can see, the columns listed in the COMPUTE BY clause must be the same as, or a subset of the columns in the ORDER BY clause. The order of the columns in the COMPUTE BY clause must also be in the same order as those in the ORDER BY clause and you cannot skip columns. Enter the following and look at the results.

SELECT type, ytd_sales
FROM titles
ORDER BY type
COMPUTE SUM(ytd_sales) BY type
type    	ytd_sales
------------    ----------
business    	4095
business    	3876
business    	18722
business    	4095
    sum
    =========
    30788
type    	ytd_sales
------------    ----------
mod_cook    	2032
mod_cook    	22246
    sum
    =========
    24278
type    	ytd_sales
------------    	----------
popular_comp    	8780
popular_comp    	4095
popular_comp    	(null)
    sum
    =========
    12875
type    	ytd_sales
---------	----------
psychology    	375
psychology    	2045
psychology   	111
psychology    	4072
psychology    	3336
    sum
    =========
    9939
type    	ytd_sales
---------	----------
trad_cook   	375
trad_cook    	15096
trad_cook   	4095
    sum
    =========
    19566
type    	ytd_sales
---------    	----------
UNDECIDED    	(null)
    sum
    =========
    (null)
(24 row(s) affected)


NOTE: The COMPUTE and COMPUTE BY clauses produce new rows of non-relational data. Therefore, they cannot be used with the SELECT INTO statement to create new tables.
You cannot use COMPUTE and COMPUTE BY with text or image datatypes because these are non-sortable.

Super-Aggregates (ROLLUP and CUBE)

To produce additional summary rows, referred to as super-aggregates, you can use the ROLLUP and CUBE operators. The ROLLUP and CUBE operators are new to SQL Server 6.5 and are used in conjunction with the GROUP BY clause.

The syntax for the ROLLUP and CUBE operators is:

SELECT column_list
FROM table_list
WHERE search_criteria
[GROUP BY [ALL] non_aggregate_expression(s)
[WITH {ROLLUP | CUBE}]]

The ROLLUP operator is typically used to produce running averages or running sums. This is done by applying the aggregate function in the SELECT column_list to each column in the GROUP BY clause, moving from left to right. What does that mean? This is most easily understood with an example.

SELECT type, pub_id, 
SUM(ytd_sales) AS ytd_sales
FROM titles
GROUP BY type, pub_id
WITH ROLLUP
type    	pub_id    	ytd_sales
-----          ------    	----------
business    	736    	18722
business    	1389    	12066
business    	(null)    	30788
mod_cook    	877    	24278
mod_cook    	(null)   	24278
popular_comp   1389    	12875
popular_comp   (null)   	12875
psychology    	736    	9564
psychology    	877    	375
psychology    	(null)    	9939
trad_cook    	877    	19566
trad_cook    	(null)    	19566
UNDECIDED    	877    	(null)
UNDECIDED    	(null)    	(null)
(null)    	(null)    	97446
(15 row(s) affected)

ANALYSIS: The ROLLUP operator produced a row in your output for each row in the titles table with a single type and pub_id. It then showed you the ytd_sales for each item. It then produced an additional row for each type with summary information. In your example, the rows with (null) in the pub_id field display the SUM of all of the ytd_sales for that group of types.

Look at it in layman's terms: In the titles table, there are two rows that had both a business type and a unique pub_id. (In the real table, there are a total of four books of type business with one author, pub_id, writing three books and another author writing the fourth). Each author who wrote business books had ytd_sales of 18,722 and 12,066 respectively.

The ROLLUP operator then created a subtotal field that summed all of the Business-type books, (18,722 + 12,066) 30,788. The query then did the same thing for each group of book types and authors in the table and then gave a grand total (97,446), signified by a (Null) value in both the type and pub_id fields.

The CUBE operator produces super-aggregate rows by using every possible combination of the columns in the GROUP BY clause. Like the ROLLUP operator, the CUBE operator produces the running averages and running sums, but also cross-references columns to return additional summary rows. Here is an example:

SELECT type, pub_id, 
SUM(ytd_sales) AS ytd_sales
FROM titles
GROUP BY type, pub_id
WITH CUBE
type    	pub_id    	ytd_sales
------------   	------    	----------
business    	736    	18722
business    	1389    	12066
business    	(null)    	30788
mod_cook    	877    	24278
mod_cook    	(null)    	24278
popular_comp    	1389    	12875
popular_comp   	(null)    	12875
psychology    	736    	9564
psychology    	877    	375
psychology    	(null)    	9939
trad_cook    	877    	19566
trad_cook    	(null)    	19566
UNDECIDED    	877    	(null)
UNDECIDED    	(null)    	(null)
(null)    	(null)    	97446
(null)    	736    	28286
(null)    	877    	44219
(null)    	1389    	24941
(18 row(s) affected)

ANALYSIS: The CUBE operator produced a row in your output for each row in the titles table with a single type and pub_id. It then showed you the ytd_sales for each item. It then produced an additional row for each type with summary information and another row for each pub_id. In this example, the rows with (null) in the pub_id field display the SUM of all of the ytd_sales for that group of types; the rows with (null) in the type field display the SUM of all of the ytd_sales for that group of pub_ids.

When using the ROLLUP or CUBE, some restrictions are placed on the GROUP BY clause. You can have a maximum of 10 columns in the GROUP BY clause and the sum of the sizes of those columns cannot exceed 900 bytes in size and you cannot use the GROUP BY ALL clause.


NOTE: The ROLLUP and CUBE operators produce new rows of non-relational data. Therefore, you cannot use them with the SELECT INTO statement. Also, you cannot use ROLLUP and CUBE operators with text or image datatypes, as these are non-sortable.

Data Correlation

In this section, you look at implementing joins to retrieve data from two or more tables. The results will appear as a single table with columns from all the tables specified in the SELECT column_list and meeting the search criteria. First you will look at how to implement joins using both ANSI and SQL Server syntax, and then the different types of joins: inner joins, cross joins, outer joins, and self joins.

Implementing Joins

To join tables, a comparison of one or more columns from a table to one or more columns in one or more tables needs to be made. The result of the comparison produces new rows by combining the columns in the SELECT column_list from the joined tables that meet the join conditions. When you join tables, you can use either ANSI or SQL Server join syntax. The syntax for both ANSI and SQL Server syntax are as follows:

ANSI join syntax:

SELECT table_name.column_name, [...]
FROM {table_name [join_type] JOIN table_name 
ON search_criteria}, [...]
WHERE search_criteria

The join statements for the ANSI syntax show up in the FROM clause of the SELECT statement. The WHERE clause selects rows from the joined rows to be returned. There are three types of ANSI join statements you can choose: INNER JOIN, OUTER JOIN, and CROSS JOIN.

SQL Server join syntax:

SELECT table_name.column_name, [...]
FROM table_list
WHERE table_name.column_name 
join_operator table_name.column_name, [...]

In SQL Server syntax, the FROM clause lists the tables involved in the join. The WHERE clause includes the columns to be joined and can include additional search criteria that determines the rows to be returned. The join operators for SQL Server syntax are: =, >, <, >=, <=, <>, !>, and !<.

Inner Joins

Joins connect two tables based on a join condition producing results as a new table with the rows that satisfy the join condition. Inner joins produce information where matching information is found in both tables. The most common types of inner joins are equijoins and natural joins.

In an equijoin, column values are compared for equality and redundant columns are displayed as columns in the result set whereas in a natural join, the redundant columns are not displayed twice.

Take a look at the following example to clarify what was just said.

SQL Server syntax:

SELECT *
FROM publishers, pub_info
WHERE publishers.pub_id = pub_info.pub_id

ANSI syntax:

SELECT *
FROM publishers
INNER JOIN  pub_info ON publishers.pub_id = pub_info.pub_id

Due to page width constraints, the following table is split into two parts. However, as output, this should be on one line.

pub_id	pub_name	city	state	country	pub_id	logo	pr_info
----	--------	------	----	----	----	--------	--------
736	New Moon Books	Boston	MA	USA	736	NEWMOON.BMP	New Moon 								Books...
877	Binnet & Hardley	Washington	DC	USA	877	BINNET.BMP	Binnet & 								Hardley...
1389	Algodata Infosystems	Berkeley	CA	USA	1389	ALGODATA.BMP	Algodata								Infosystem
1622	Five Lakes Publishing	Chicago	IL	USA	1622	5LAKES.BMP	Five Lakes 								Publishing
1756	Ramona Publishers	Dallas	TX	USA	1756	RAMONA.BMP	Ramona								Publishers
9901	GGG&G	München	NULL	GER	9901	GGGG.BMP	GGG&G...
9952	Scootney Books	New York	NY	USA	9952	SCOOTNEY.BMP	Scootney 								Books...
9999	Lucerne Publishing	Paris	NULL	FRA	9999	LUCERNE.BMP	Lucerne 								Publishing...
(8 row(s) affected)

In this example the SELECT statement selects all columns from both the publishers and pub_info tables where the pub_id columns for the joined tables are equal. Note the redundant pub_id column.

Natural Joins

In a natural join, column values are compared for equality, but redundant columns are eliminated from the columns in the result set.

Here is an example of a natural join.

SQL Server syntax:

SELECT publishers.*, pub_info.logo, pub_info.pr_info
FROM publishers, pub_info
WHERE publishers.pub_id = pub_info.pub_id

ANSI syntax:

SELECT publishers.*, pub_info.logo, pub_info.pr_info
FROM publishers 
INNER JOIN pub_info ON publishers.pub_id = pub_info.pub_id
pub_id 	pub_name    	city    	state    country  	...
----   	--------    	--------    	----    	----    	...
736    	New Moon Books    	Boston    	MA    	USA    	...
877    	Binnet & Hardley    	Washington  DC    	USA  	...
1389    	Algodata Infosystems   	Berkeley    	CA    	USA    	...
1622    	Five Lakes Publishing   	Chicago    	IL    	USA    	...
1756    	Ramona Publishers    	Dallas    	TX    	USA    	...
9901    	GGG&G    	München    	NULL    	GER    	...
9952    	Scootney Books    	New York   	NY    	USA    	...
9999    	Lucerne Publishing    	Paris    	NULL   	FRA    	...
(8 row(s) affected)

In this example, the SELECT statement selects all columns from the publishers table and all columns except pub_id from the pub_info table.

Cross or Unrestricted Joins

Cross or unrestricted joins return a combination of all rows of all tables in the join as the result set. A cross or unrestricted join is created not by using the WHERE clause in the SQL Server join of two or more tables, but by using the CROSS JOIN keyword for the ANSI join.

The combining of all rows from all tables involved in the join yields what is known as a Cartesian product. In most cases this type of result set is unusable, unless your intention was to find every possible combination, such as some type of statistical or mathematical analysis. To put it another way, if you look at each table as a matrix and then you multiply the matrices, you get a new matrix with all combinations as shown in Figure 11.1. Each row from Table1 is added to each row in Table2. If you add the number of columns from both tables, you get the resulting number of columns. If you multiply the number of rows in Table1 by the number of rows in Table2, you get the total number of rows returned by your query.

Figure 11.1. Creating a Cartesian product.


NOTE: Tables cannot be joined on text or image columns. You can, however, compare the lengths of text columns from two tables with a WHERE clause, but not actual data.

Here is an example of creating a Cartesian product using the CROSS JOIN or unrestricted join. Say you want to list all book titles and their authors' IDs, and you submit the following query:

SQL Server syntax:

SELECT titles.title, titleauthor.au_id
FROM titles, titleauthor

ANSI syntax:

SELECT titles.title, titleauthor.au_id
FROM titles CROSS JOIN titleauthor
title    				    au_id
--------------    			    ----------
The Busy Executive's Database Guide    172-32-1176
The Busy Executive's Database Guide    213-46-8915
.    .
.    .
.    .
Sushi, Anyone?    			    998-72-3567
Sushi, Anyone?    			    998-72-3567
(450 row(s) affected)

The results of the query you submitted yielded 450 rows. There are 18 rows in the titles table and 25 rows in the titleauthor table. Since an unrestricted or cross join returns all possible combinations, you get 18x25 = 450 rows. Not quite what you wanted as your desired results, right?

To avoid submitting an unrestricted join, you should take the number of tables you are joining and subtract 1. N-1 is the number of join clauses needed, where N is the number of tables involved in the join (in other words, three tables, 3 - 1 = 2, two join clauses). The number of join clauses may be more if your are joining based on a composite key.

Outer Joins

You can restrict rows from one table while allowing all rows from another table as your result set by using outer joins. One of the most common uses for this type of join is to search for orphan records. You can create an outer join statement using either SQL Server or ANSI syntax. The outer join operators and keywords are as follows:

SQL Server syntax:
*= Includes all rows from the first table and only the matching rows in the second table (left outer join).
=* Includes all rows from the second table and only the matching rows in the first table (right outer join).

ANSI syntax:

LEFT OUTER JOIN Includes all rows from the first table and only the matching rows in the second table.
RIGHT OUTER JOIN Includes all rows from the second table and only the matching rows in the first table.
FULL OUTER JOIN Includes all non-matching rows from both tables.

You have a table of customers and a table with orders. These two tables are related by a CustomerID field. With an equijoin or natural join, you would only return records where the CustomerID field has a match in both tables. Outer joins can be handy to get a Customer list and if a customer happens to have an order, that order information also shows up. If they do not have an order, then the information from the orders table shows up as (null).

If you did a left outer join on these tables and specified the Customers table first, then the desired results are returned. If you specify a right outer join, then your results would show all Orders and if an Order happens to have a CustomerID that does not match a CustomerID in the Customers table, then the Customer information would be (null). (If you follow the rules of referential integrity, you should never have an order without a valid CustomerID. If this is the case, then your right outer join would have the same results as an equijoin or natural join, all Orders and Customers where there is a match on CustomerID.)

The left and right outer joins can return the same results dependent upon the table order. For example, these two joins return the same information: Customers.CustomerID *= Orders.CustomerID and Orders.CustomerID =* Customers.CustomerID.

If you wanted to find all the titles and if they happen to have sold any copies, or the number of copies sold, you could submit the following query:

SQL Server syntax:

SELECT titles.title_id, titles.title, sales.qty
FROM titles, sales
WHERE titles.title_id *= sales.title_id

ANSI syntax:

SELECT titles.title_id, titles.title, sales.qty
FROM titles LEFT  OUTER JOIN sales 
ON titles.title_id = sales.title_id
title_id     	title                                         	qty
--------     	--------------------------------------      ---
BU1032   	The Busy Executive's D...     	5
BU1032   	The Busy Executive's D...     	10
BU1111   	Cooking with Computers...     	25
BU2075   	You Can Combat Compute...    	35
BU7832   	Straight Talk About Co...    	15
MC2222   	Silicon Valley Gastron...     	10
MC3021   	The Gourmet Microwave        	25
MC3021   	The Gourmet Microwave        	15
MC3026   	The Psychology of Comp...     	(null)
PC1035   	But Is It User Friendl...    	30
PC8888   	Secrets of Silicon Val...    	50
PC9999   	Net Etiquette        	(null)
PS1372   	Computer Phobic AND No...    	20
PS2091   	Is Anger the Enemy?       	3
PS2091   	Is Anger the Enemy?        	75
PS2091   	Is Anger the Enemy?        	10
PS2091   	Is Anger the Enemy?        	20
PS2106   	Life Without Fear            	25
PS3333   	Prolonged Data Deprivat...     	15
PS7777   	Emotional Security: A ...    	25
TC3218   	Onions, Leeks, and Gar...     	40
TC4203   	Fifty Years in Bucking...    	20
TC7777   	Sushi, Anyone?        	20
(23 row(s) affected)

When using an outer join with SQL Server syntax using a NULL comparison, SQL Server yields unexpected results. For all rows in the outer table that do not match rows in the inner table based upon the NULL comparison, NULL values are returned. The ANSI syntax yields the expected results.

SQL Server syntax:

SELECT title, stor_id, qty
FROM titles, sales
WHERE titles.title_id *= sales.title_id
AND qty IS NULL
title    	stor_id    	qty
-------------------------------------------     -------    ----
The Busy Executive's Database Guide    	(null)    	(null)
Cooking with Computers: Surreptitious ...   	(null)    	(null)
You Can Combat Computer Stress!    	(null)    	(null)
.    .    .
.    .    .
.    .    .
Onions, Leeks, and Garlic: Cooking Secrets ...   	(null)    	(null)
Fifty Years in Buckingham Palace Kitchens    	(null)    	(null)
Sushi, Anyone?    	(null)    	(null)
(18 row(s) affected)

ANSI syntax:

SELECT title, stor_id, qty
FROM titles 
LEFT OUTER JOIN sales ON titles.title_id = sales.title_id
WHERE qty IS NULL
title                            	stor_id     	qty
------------------------------     	------   	------
The Psychology of Computer Cooking      	(null)    	(null)
Net Etiquette     	(null)    	(null)
(2 row(s) affected)

Self Joins

As the name implies, a self join correlates rows of a table with other rows in the same table. Comparison queries for the same information are used most often for self joins. If you want to list all authors that live in the same city and zip code, for example, you will compare city and zip by executing the following query:

SQL Server syntax:

SELECT au1.au_fname, au1.au_lname,
au2.au_fname, au2.au_lname,
au1.city, au1.zip
FROM authors au1, authors au2
WHERE au1.city = au2.city
AND au1.zip = au2.zip
AND au1.au_id < au2.au_id
ORDER BY au1.city, au1.zip

ANSI syntax:

SELECT au1.au_fname, au1.au_lname,
au2.au_fname, au2.au_lname,
au1.city, au1.zip
FROM authors au1
INNER JOIN authors au2 ON au1.city = au2.city
AND au1.zip = au2.zip
WHERE au1.au_id < au2.au_id
ORDER BY au1.city, au1.zip
au_fname   	au_lname    au_fname    au_lname    city            zip
--------  	--------    --------    --------    ------------    ------
Cheryl    	Carson      Abraham     Bennet      Berkeley        94705
Dean    	Straight    Dirk        Stringer    Oakland         94609
Dean    	Straight    Livia       Karsen      Oakland         94609
Dirk    	Stringer    Livia       Karsen      Oakland         94609
Ann    		Dull        Sheryl      Hunter      Palo Alto       94301
Anne    	Ringer      Albert      Ringer      Salt Lake City  84152
(6 row(s) affected)

ANALYSIS: Notice that when you do a self join on a table, you create an alias for the table name. This is done so that one table is treated logically as two tables.


TIP: A table alias is useful anytime you do a multi-table join operation. It enables you to create a more readable and shorter query statement, because you reference the table alias instead of the table name.

MS Query Tool

If you're still having problems with all this syntax, you're in luck. Included with the SQL Server 6.5 utilities is MS Query. MS Query is a tool that enables you to build your queries visually as shown in Day 3, "SQL Server Tools and Utilities."

Once you have built your query visually, you can cut and paste the SQL syntax generated by MS Query into ISQL/w. By pressing the SQL button in the MS Query utility, you can see the statements in Access-SQL, which is the version of SQL that Microsoft Access uses.

You must remove the quotation marks from the query syntax generated by MS Query. Otherwise you receive a syntax error when you try to run it from within ISQL/w or the Query tool in the Enterprise Manager.

You can also use table aliases as shown on Day 10, "Data Retrieval," to make the SQL statement shorter. The following example is pasted from the MS Query SQL syntax and then modified using table aliases.

SELECT a.au_lname, a.au_fname, t.title, t.price
FROM authors a, titleauthor ta, titles t
WHERE ta.au_id = a.au_id
AND t.title_id = ta.title_id

Subqueries

A SELECT statement nested inside another SELECT statement is commonly referred to as a subquery. Subqueries can produce the same results as a join operation. In this section, you will look at how the subquery is used, the types of subqueries, subquery restrictions, and correlated subqueries.

How To Use The Subquery

A SELECT statement can be nested within another SELECT, INSERT, UPDATE, or DELETE statement. If the subquery returns a single value, such as an aggregate, it can be used anywhere a single value can be used. If the subquery returns a list, such as a single column of many values, it can only be used in the WHERE clause.

In many cases a join operation can be used instead of a subquery, but some instances can only be processed as a subquery. In some cases a join operation can yield better performance than a subquery, but generally there is little performance difference.

The subquery is always enclosed within parentheses and it completes before the outer query is processed, unless there is a correlated subquery. A subquery can contain another subquery and that subquery can contain a subquery and so on. There is no practical limit to the number of subqueries that can be processed other than system resources.

The syntax for a nested SELECT statement:

(SELECT [ALL | DISTINCT] subquery_column_list  [FROM table_list]  [WHERE clause]  [GROUP BY clause] [HAVING clause])

Types of Subqueries

A subquery can be used to return a single column or single value anywhere a single value expression can be used, and a subquery can be compared against using the following operators: =, <, >, <=, >=, <>, !>, and !<. It can also return a single column or many values that can be used with the IN list comparison operator in the WHERE clause. A subquery can also return many rows that are used for an existence check by using the EXISTS keyword in the WHERE clause.

To find all the authors that live in the same state as the bookstore that sells their publisher's books, for example, you could run either of the following queries.

SELECT DISTINCT au_fname, au_lname, state
FROM authors
WHERE state IN
(SELECT state FROM stores)

or

SELECT DISTINCT au_fname, au_lname, state
FROM authors
WHERE EXISTS
(SELECT * FROM stores
WHERE state = authors.state)
au_fname    		au_lname    		state
---------          --------                -----
Abraham     		Bennet       			CA
Akiko       		Yokomoto     	   	CA
Ann         			Dull          		   	CA
Burt        		Gringlesby   	   	CA
Charlene    		Locksley     	   	CA
Cheryl      		Carson        		   	CA
Dean        		Straight     	   	CA
Dirk        		Stringer     	   	CA
Heather     		McBadden     	   	CA
Johnson     		White          		   	CA
Livia       		Karsen    		   	CA
Marjorie    		Green       		   	CA
Michael     		O'Leary    		   	CA
Reginald    		Blotchet-Halls 	   	OR
Sheryl      		Hunter        		   	CA
Stearns    		MacFeather   	   	CA
(16 row(s) affected)

Restrictions exist on what you can do with subqueries. This list provides the rules by which you can create and use a subquery.

Correlated Subqueries

A correlated subquery references a table from the outer query and evaluates each row for the outer query. In this aspect a correlated subquery differs from a normal subquery, because the subquery depends on values from the outer query. A normal subquery is executed independently of the outer query.

In the following example, the join query is rewritten as a correlated subquery. The queries return the same information and answer the question: who are the authors that live in the same city and zip code?

SQL Server JOIN syntax:

SELECT au1.au_fname, au1.au_lname,
au2.au_fname, au2.au_lname,
au1.city, au1.zip
FROM authors au1, authors au2
WHERE au1.city = au2.city
AND au1.zip = au2.zip
AND au1.au_id < au2.au_id
ORDER BY au1.city, au1.zip
au_fname   au_lname  au_fname  au_lname  city    	      zip
------     ------    ------    ------    ----------      ------
Cheryl     Carson    Abraham   Bennet    Berkeley        94705
Dean    	Straight  Dirk      Stringer  Oakland         94609
Dean    	Straight  Livia     Karsen    Oakland         94609
Dirk    	Stringer  Livia     Karsen    Oakland         94609
Ann    		Dull      Sheryl    Hunter    Palo Alto       94301
Anne    	Ringer    Albert    Ringer    Salt Lake City  84152
(6 row(s) affected)

SQL Server Correlated Subquery syntax:

SELECT au1.au_fname, au1.au_lname, au1.city, au1.zip
FROM authors au1
WHERE zip IN
(SELECT zip
FROM authors au2
WHERE au1.city = au2.city
AND au1.au_id <> au2.au_id)
ORDER BY au1.city, au1.zip
au_fname    au_lname    	 city              zip
--------    ------------    --------------    --------
Abraham     Bennet          Berkeley          94705
Cheryl      Carson          Berkeley          94705
Livia       Karsen          Oakland           94609
Dirk        Stringer        Oakland           94609
Dean        Straight        Oakland           94609
Sheryl      Hunter          Palo Alto         94301
Ann         Dull            Palo Alto         94301
Albert      Ringer          Salt Lake City    84152
Anne        Ringer          Salt Lake City    84152
(9 row(s) affected)

Notice that the same data is returned; it is simply formatted differently and is more readable.

SELECT INTO

The SELECT INTO statement enables you to create a new table based on query results. The new table is based on the columns you specify in the select list, the table(s) you name in the FROM clause, and the rows you choose in the WHERE clause. You can create two types of tables with a SELECT INTO statement: permanent and temporary. The syntax for the SELECT INTO is as follows:

SELECT column_list
INTO new_table_name
FROM table_list
WHERE search_criteria

When creating a permanent table, the SELECT INTO/BULK COPY database option must be set. The SELECT INTO statement enables you to define a table and put data into it without going through the usual data definition process. The name of the new table must be unique within the database and must conform to the rules for SQL Server naming conventions. If columns in the column_list of your SELECT statement have no titles, such as derived columns like aggregate functions, the columns in the new table will have no names; however, there are two problems with this.

1. Column names within a table must be unique. Therefore, if more than one column has no header, the SELECT INTO fails.

2. If there is a column with no header in the new table, the only way to retrieve that column is to use SELECT *. For this reason it is good practice to create column aliases for derived columns. In addition, because using the SELECT INTO is a non-logged operation, you should backup your database immediately following this operation.

You can also use the SELECT INTO statement to create temporary tables, of which there are two types: local and global. A local temporary table is available only during the current user session to SQL Server and is deallocated when the session is terminated. A local temporary table is created by preceding the new table name with the # symbol. A global temporary table is available to all user sessions to SQL Server and is deallocated when the last user session accessing the table is terminated. A global temporary table is created by preceding the new table name with two ## symbols. These temporary tables reside in the tempdb database.

SELECT title_id, title, price
INTO #tmpTitles
FROM titles
GO
SELECT * FROM #tmpTitles
GO 
 (18 row(s) affected)
title_id   Title    				               price
--------   ----------------------------------             --------
BU1032    	The Busy Executive's Database Guide            19.99
BU1111    	Cooking with Computers: Surreptitious...       11.95
BU2075    	You Can Combat Computer Stress!                 2.99
.    	  .     .
.    	  .     .
.    	  .     .
TC3218    	Onions, Leeks, and Garlic: Cooking Secrets...  20.95
TC4203    	Fifty Years in Buckingham Palace Kitchens      11.95
TC7777    	Sushi, Anyone?    				       14.99
(18 row(s) affected)



TIP: The purpose of the SELECT INTO statement is to create a new table. If you want to add rows to a preexisting table, use the INSERT statement or INSERT INTO.

UNION Operator

You can combine the results of two or more queries into a single result set by using the UNION operator. By default, duplicate rows are eliminated; however, using UNION with the ALL keyword returns all rows, including duplicates.

SELECT column_list [INTO clause]  [FROM clause]  [WHERE clause]  [GROUP BY clause]  [HAVING clause]  [UNION [ALL]  SELECT column_list  [FROM clause]  [WHERE clause]  [GROUP BY clause]  [HAVING
clause]...]  [ORDER BY clause]  [COMPUTE clause]

Here are the rules for using the UNION operator:

SELECT title, stor_name, ord_date, qty
FROM titles, sales, stores
WHERE titles.title_id = sales.title_id
AND stores.stor_id = sales.stor_id
UNION
SELECT title, `No Sales', NULL, NULL
FROM titles
WHERE title_id NOT IN
(SELECT title_id FROM sales)
ORDER BY qty
title    	         		    stor_name            ord_date        qty
--------------                 ----------            --------       ---
Net Etiquette       		    No Sales             (null)            (null)
The Psychology...  		    No Sales             (null)            (null)
Is Anger the Enemy?            Eric the Read Books   Sep 13 1994...  3
.    .    .    .
.    .    .    .
.    .    .    .
Onions, Leeks, and Garlic:...  News & Brews          Jun 15 19...    40
Secrets of Silicon Valley      Barnum's              May 24 1993...  50
Is Anger the Enemy?            Barnum's              Sep 13 1994...  75
(23 row(s) affected)

Summary

In this chapter you learned about aggregate functions like SUM and AVG and how to use them in a SELECT statement. Remember that aggregate functions can return summary values for an entire table or for groups of rows in a table. They are also normally used in conjunction with the GROUP BY or HAVING clauses, or in the column_list.

GROUP BY clauses group summary data that meets the WHERE clause criteria, to be returned as single rows. The HAVING clauses can also be used to set the criteria to be returned by the GROUP BY clause.

You also learned about the COMPUTE and COMPUTE BY clauses that can be used to produce new rows of summary and detail data. The COMPUTE clause is useful as it returns detail rows and a grand total summary row. The COMPUTE BY clause returns rows as subgroups with summary values.

In order to produce additional summary rows, which are often called super-aggregates, you can use the ROLLUP and CUBE operators in conjunction with the GROUP BY clause.

The ROLLUP operator is typically used to produce running averages or running sums. The CUBE operator produces super-aggregate rows by using every possible combination of the columns in the GROUP BY clause.

You also learned how to join tables using both the ANSI syntax as well as the SQL Server syntax. The join statements for the ANSI syntax show up in the FROM clause of the SELECT statement, whereas the WHERE clause is used for the SQL Server syntax. There are three types of ANSI join statements: INNER JOIN, OUTER JOIN and CROSS JOIN. Inner Joins are either equijoins or natural joins and return information where values in two tables are identical. Outer joins can show you all information from one table and, if the associated table has values that match, they will be displayed as well. Cross or unrestricted joins return a combination of all rows of all tables in the join as the result set.

You also learned about creating subqueries. This occurs when you nest a SELECT statement inside another SELECT statement. Subqueries can produce the same results as a join operation.

The SELECT INTO statement enables you to create a new table based on query results. You can create either permanent tables or temporary tables. When you create a permanent table, you must have the SELECT INTO/Bulkcopy option set to true for the database. To create temporary tables, you prefix the table name with a # for a local temporary table, or a ## symbol for a global temporary table.

Finally, you learned how to combine the results of two or more queries into a single result set by using the UNION operator.

Q&A

Q When should I use ANSI syntax and when should I use SQL Server Syntax?

A
In general, you should try to use the ANSI syntax as this implementation is a standard, the code is much more portable, and there is little or no performance gain/loss.

Q What does the following statement do?
SELECT SUM(qty) FROM sales
A This returns the sum of the qty field in the sales table.

Q Is it a good idea to use COMPUTE and COMPUTE BY?


A
It is okay to use these statements, but remember that the data they return is non-relational.

Workshop

The Workshop provides quiz questions to help you solidify your understanding of the material covered, as well as exercises to provide you with experience in using what you've learned. Try to understand the quiz and exercise answers before continuing on to the next day's lesson. Answers are provided in Appendix B.

Quiz

1. Can I run a subquery as a join and vice versa?

2. True or false: ROLLUP and CUBE do not supply summary information.

3. Can I use a SELECT INTO statement to build a temporary table that everyone has access to?

Exercises

1. Create a query that returns the average of the ytd_sales figures from the titles table in the pubs database.

2. Using the GROUP BY and HAVING clauses, create a query that finds all the books with more than one author. (Hint: Use the titleauthor table.)

3. Using the COMPUTE BY clause, create a query that reports the stor_id and a running sum of the quantity of books ordered. Use the sales table to do this.

4. Create a query using joins (either a SQL Server join, or an ANSI join) to show an author's first name, last name, and the title of their books. (Use the au_fname, au_lname from the authors table and the title field from the titles table.)

5. Create a subquery to find which authors live in the same state as any of the stores.

6. Create a temporary table containing all the information from the employees table. Test the existence of your new table by selecting data from it.


Previous chapterNext chapterContents


Macmillan Computer Publishing USA

© Copyright, Macmillan Computer Publishing. All rights reserved.