11.8 How do I…Prevent the selection of duplicate rows?

Problem

In some of my queries, duplicate rows are returned. Even though the records are valid for the query, I don’t want to return duplicate rows from the query. In many cases, I want to view the unique values within a column. How do I prevent the selection of duplicate rows and present unique results?

Technique

The DISTINCT operator causes only unique rows to be returned by a query. If any column in the query makes the row unique, the row is returned. The keyword is placed at the beginning of the select list and needs to be specified only once in the query.

Steps

1. Run SQL*Plus and connect as the WAITE user account. CHP11_20.SQL, shown in Figure 11.16, creates the sample tables and records used in this How-To.

The data created in the sample table contains duplicate values for some of the columns. When the data is queried using the DISTINCT operator, the duplicate column values are eliminated. Run the statement to create the table and data.

SQL> START CHP11_20.sql

Table dropped.

Table created.

1 row created.

1 row created.

1 row created.

1 row created.

Commit complete.

2. Load CHP11_21.SQL into the SQL buffer. The file contains a query of the sample table that does not use the DISTINCT keyword to eliminate duplicate rows.

SQL> GET CHP11_21.sql

1 SELECT

2 DEPT_NO

3 FROM

4 EMP11

5* ORDER BY DEPT_NO

Line 2 specifies the column returned by the query. Line 4 presents the sample table as the source of the data, and line 5 specifies the order of the resulting data.

3. Run the statement to show the results.

SQL> /

DEPT_NO

----------

1

1

2

2

If the purpose of the query is to display the unique department numbers, the duplicate data makes the query results hard to read. Even though each record contains a unique employee, the DEPT_NO column is duplicated among records. If the source table contains hundreds of records, the results of this query are unreadable.

4. Load CHP11_22.SQL into the SQL buffer. The file contains a query that returns unique department numbers by specifying the DISTINCT keyword before the select list.

SQL> GET CHP11_22.sql

1 SELECT

2 DISTINCT DEPT_NO

3 FROM

4 EMP11

5* ORDER BY DEPT_NO

Line 2 uses the DISTINCT keyword to prevent the query from returning duplicate department numbers. The remaining lines of the query are identical to the query presented in Step 2.

5. Run the statement to display the results.

SQL> /

DEPT_NO

----------

1

2

Regardless of the number of rows in the table, only distinct departments are returned by the query. The DISTINCT keyword applies to all columns following it in the select list.

6. Load CHP11_23.SQL into the SQL buffer. The query returns two columns after the DISTINCT operator.

SQL> GET CHP11_23.sql

1 SELECT

2 DISTINCT DEPT_NO, EMP_NO

3 FROM

4 EMP11

5* ORDER BY DEPT_NO

Line 2 specifies two columns to be returned by the query. The DISTINCT keyword will cause distinct DEPT_NO, EMP_NO combinations to be returned by the query.

7. Run the query to display the results.

DEPT_NO EMP_NO ---------- --------- 1 1 1 2 2 3 2 4

Even though duplicate DEPT_NO columns are returned by the query, all the rows returned are distinct.

How It Works

The DISTINCT keyword is used to prevent queries from returning duplicate rows. Step 1 creates the sample tables and records used throughout this How-To. Steps 2 and 3 query the records in the table without using the DISTINCT keyword. Steps 4 and 5 use the DISTINCT keyword to return distinct DEPT_NO columns from the sample table. Steps 6 and 7 show the results of a query with two columns specified in a select list using the DISTINCT keyword.

Comments

The DISTINCT operator is useful in analyzing data in a table. When you use the DISTINCT keyword, keep in mind that it affects the entire select list. You don’t need to specify the keyword for each column in the list. The keyword is useful within subqueries to limit the number of rows to be processed by the query.