Database: SELECT: GROUP BY

The result of the FROM and WHERE clauses is the join of tables with records chosen for the result by the conditions of the WHERE clause.   When you want summary information about groups of records, you will use the GROUP BY clause. The examples below are based on the University Database.

A way to simplify thinking about GROUP BY problems is to break the problem into two steps:
  1. Show all the information, and order it by the relevant field.
  2. Replace the fields you want to summarize (count, find average of, ...) by the appropriate summary functions.
There is no need to make these two separate steps, but use them if you're having trouble thinking about the problem.

Example - Count Ed Centers in Country

Problem: How many Ed Centers are there in each country?
Let's first find all the information before summarizing it.
/* Display countries and Ed Center names */
SELECT edc_country, edc_name
    FROM Ed_Centers
    ORDER BY edc_country;
This produces a list, ordered by country, of all the Ed Centers. It's close to what we want, but we want only one line for each of the countries and a count of the Ed Centers.
/* Display countries and count of Ed Centers */
SELECT edc_country, COUNT(*)
    FROM Ed_Centers
    GROUP BY edc_country
    ORDER BY edc_country;
Because GROUP BY usually looks at the records in sorted order to get all the same values together, the ORDER BY clause may sometimes be redundant, but you should still use it because some DBMS may use another means of doing the GROUP BY. And, if they are already ordered as a result of GROUP BY, the ORDER BY will not require any extra processing.

Example - Count courses that each instructor is teaching

Problem: How many courses is each instructor teaching?
Let's illustrate how GROUP BY works with an example.  Using the University Database, we might want to ask how many classes each instructor is teaching.  Let's first display all the courses that each instructor is teaching.
/* Display all instructors and the courses they are teaching */
SELECT Instructors.ins_id, ins_lname, ins_fname, sec_id
    FROM Instructors, Sections
    WHERE Instructors.ins_id = Sections.ins_id
    ORDER BY ins_lname, ins_fname;
This contains all the information that we need.  Using a GROUP BY clause and selecting the count instead of the crs_id will let us count the number of records for each instructor.
/* Display all instructors and the courses they are teaching */
SELECT Instructors.ins_id, ins_lname, ins_fname, COUNT(*)
    FROM Instructors, Sections
    WHERE Instructors.ins_id = Sections.ins_id
    GROUP BY Instructors.ins_id, ins_lname, ins_fname
    ORDER BY ins_lname, ins_fname;

Example - Compute the average GPA for students in each of the courses

Problem: Compute the average GPA for students in each of the courses.
This requires joining the Students, Enrollments, and Sections table. First let's print all the student GPAs in each of the courses (crs_id).
/* Stage 1 - list all courses and GPAs of all students in them. */
SELECT crs_id, stu_gpa
    FROM Students, Enrollments, Sections
    WHERE Students.stu_id = Enrollments.stu_id
      AND Enrollments.sec_id = Sections.sec_id
    ORDER BY crs_id;
Now use the AVG summary function with a GROUP BY clause.
/* Stage 2 - Add GROUP BY and use AVG function */
SELECT crs_id, AVG(stu-gpa)
    FROM Students, Enrollments, Sections
    WHERE Students.stu_id = Enrollments.stu_id
      AND Enrollments.sec_id = Sections.sec_id
    GROUP BY crs_id
    ORDER BY crs_id;

GROUP BY retains only mentioned fields

When the records are grouped, all fields except those which are explicitly given in GROUP BY are removed.  The effect of this is that the SELECT clause can refer only to the GROUP BY fields, except as parameters to the summary functions.