Database: SELECT: GROUP BY
The result of the FROM and WHERE clauses is the join of tables with records
chosen for the result by the conditions of the WHERE clause. When
you want summary information about groups of records, you will use the GROUP
BY clause. The examples below are based on the
University Database.
A way to simplify thinking about GROUP BY problems is to break the problem
into two steps:
- Show all the information, and order it by the relevant field.
- Replace the fields you want to summarize (count, find average of, ...)
by the appropriate summary functions.
There is no need to make these two separate steps, but use them if
you're having trouble thinking about the problem.
Example - Count Ed Centers in Country
Problem: How many Ed Centers are there in each country?
Let's first find all the information before summarizing it.
/* Display countries and Ed Center names */
SELECT edc_country, edc_name
FROM Ed_Centers
ORDER BY edc_country;
This produces a list, ordered by country, of all the Ed Centers.
It's close to what we want, but we want only one line for each of the
countries and a count of the Ed Centers.
/* Display countries and count of Ed Centers */
SELECT edc_country, COUNT(*)
FROM Ed_Centers
GROUP BY edc_country
ORDER BY edc_country;
Because GROUP BY usually looks at the records in sorted order to get all the
same values together, the ORDER BY clause may sometimes be redundant, but you should
still use it because some DBMS may use another means of doing the GROUP BY.
And, if they are already ordered as a result of GROUP BY, the ORDER BY will
not require any extra processing.
Example - Count courses that each instructor is teaching
Problem: How many courses is each instructor teaching?
Let's illustrate how GROUP BY works with an example. Using the University
Database, we might want to ask how many classes each instructor is
teaching. Let's first display all the courses that each instructor is teaching.
/* Display all instructors and the courses they are teaching */
SELECT Instructors.ins_id, ins_lname, ins_fname, sec_id
FROM Instructors, Sections
WHERE Instructors.ins_id = Sections.ins_id
ORDER BY ins_lname, ins_fname;
This contains all the information that we need. Using a GROUP BY
clause and selecting the count instead of the crs_id will let us count the
number of records for each instructor.
/* Display all instructors and the courses they are teaching */
SELECT Instructors.ins_id, ins_lname, ins_fname, COUNT(*)
FROM Instructors, Sections
WHERE Instructors.ins_id = Sections.ins_id
GROUP BY Instructors.ins_id, ins_lname, ins_fname
ORDER BY ins_lname, ins_fname;
Example - Compute the average GPA for students in each of the courses
Problem: Compute the average GPA for students in each of the courses.
This requires joining the Students
, Enrollments
, and Sections
table.
First let's print all the student GPAs in each of the courses (crs_id).
/* Stage 1 - list all courses and GPAs of all students in them. */
SELECT crs_id, stu_gpa
FROM Students, Enrollments, Sections
WHERE Students.stu_id = Enrollments.stu_id
AND Enrollments.sec_id = Sections.sec_id
ORDER BY crs_id;
Now use the AVG summary function with a GROUP BY clause.
/* Stage 2 - Add GROUP BY and use AVG function */
SELECT crs_id, AVG(stu-gpa)
FROM Students, Enrollments, Sections
WHERE Students.stu_id = Enrollments.stu_id
AND Enrollments.sec_id = Sections.sec_id
GROUP BY crs_id
ORDER BY crs_id;
GROUP BY retains only mentioned fields
When the records are grouped, all fields except those which are explicitly
given in GROUP BY are removed. The effect of this is that the SELECT
clause can refer only to the GROUP BY fields, except as parameters to the
summary functions.