Database: SELECT: DISTINCT
When the output would include many copies of the same information,
the SQL keyword DISTINCT may be used to show only one copy.
For an example from the University Database,
imagine the output for the following:
Request: Who is currently teaching courses?
A plan: The Section table contains only current courses,
so we can join it with the Instructor table, which may contain
instructors who are not currently teaching.
Instructors who can be joined with a current section
must therefore currently be teaching.
/* First version */
SELECT ins_lname, ins_fname
FROM Instructor, Section
WHERE Instructor.ins_id = Section.ins_id
ORDER BY ins_lname, ins_fname;
Removing redundant rows with DISTINCT
The problem with the above SELECT is that there will be a row
for each section that an instructor teaches. This is not
what was requested. We can use the DISTINCT keyword
immediately after SELECT to reduce all identical lines
to only one.
/* Second version */
SELECT DISTINCT ins_lname, ins_fname
FROM Instructor, Section
WHERE Instructor.ins_id = Section.ins_id
ORDER BY ins_lname, ins_fname;
A possible side-effect of DISTINCT
is that the
records might be sorted. However, they might not be because there is nothing
in the SQL standard says that DISTINCT
should put the
records in order.
They may be, but may not be, depenting on the DBMS used.
If you really want them sorted, don't rely on the side effects of DISTINCT
which occur in some systems. Always use ORDER BY
if you really want the records ordered.
Removing too many rows?
DISTINCT may have the effect of removing unintended rows.
In the previous example it will reduce two different instructors
with the same name to a single row (I've had two coworkers with the same name).
Because RDBMS tables typically have a unique primary key,
we can use the unique key to prevent collapsing same named, but different,
instructors into a single line.
/* Third version */
SELECT DISTINCT Instructor.ins_id, ins_lname, ins_fname
FROM Instructor, Section
WHERE Instructor.ins_id = Section.ins_id
ORDER BY ins_lname, ins_fname;
Note that we had to qualify ins_id
with a table name
because it appears in two of the joined tables. Yes, they will have
the same value, but you generally have to qualify ambiguous
fields.
MS Access DISTINCTROW
Microsoft Access introduced an additional keyword, DISTINCTROW
,
which can be used in place of DISTINCT
.
This keeps resulting rows which have identical values, but come from
different records from collapsing into a single row.
With DISTINCTROW
we don't
have to include the key in the output.
/* MS Access only version */
SELECT DISTINCTROW ins_lname, ins_fname
FROM Instructor, Section
WHERE Instructor.ins_id = Section.ins_id
ORDER BY ins_lname, ins_fname;