Database: SELECT: Logical Order

The general form of the select statement with all of its clauses is
   SELECT columns/expressions
       FROM tables/joins
       WHERE row conditions
       GROUP BY columns
       HAVING grouping conditions
       ORDER BY column
When trying to figure out how a select statement logically works, here is the order. The actual order that the DBMS uses will probably be quite different, and unknown, because it will try to optimize the access. This describes the process for SQL1, but SQL2 FROM clause joins are fairly simple to reduce to the SQL1 FROM ... WHERE syntax. Altho this is a bit simplified because it ignores subqueries, it is useful in most cases.

Logical order

  1. FROM clause tables are used to produce a cross join (product, cartesian product). This is a table which consists of every possible combination of every row in all tables participating in the join. If we were to join 6,000 row table with a 3,000 row table the cross join table would have 6,000 x 3,000 = 18,000,000 rows! Of course, the DBMS will optimize this and produce a smaller table internally, but this large table is the logical result, and it's easier to work believing that this giant cross join is used.
  2. The WHERE condition are tested against each row of the FROM cross join result. Only those rows producing a TRUE result (neither FALSE nor UNKNOWN). are kept in the result.
  3. If there is a GROUP BY grouping-columns clause, it is applied to the previous (FROM, WHERE) result. All rows with the same values in the grouping-columns are put together. All rows having the same column value are reduced to a single row. What is in the columns in the row is specified by the SELECT statement. It must have specified one of the following: Note that the table resulting from GROUP BY is very different than what came in from the join, both in rows and columns.
  4. A HAVING group-condition is similar to the the WHERE clause, except it applies to the result of the GROUP BY clause. It can only refer to the selected group columns, not fields in the records before the grouping. It typically will make a comparison using summary functions.
  5. The ORDER BY clause is now applied to the previous result.
  6. Finally, the SELECT columns are chosen. Columns which are not specified in the SELECT clause are not included in the result. These columns can be column names or expressions. If the DISTINCT keyword is specified, multiple rows with the save value will be condensed into only one row. If there is a GROUP BY clause, the column