Database: SELECT: Logical Order
The general form of the select statement with all of its clauses
is
SELECT columns/expressions
FROM tables/joins
WHERE row conditions
GROUP BY columns
HAVING grouping conditions
ORDER BY column
When trying to figure out how a select statement logically works,
here is the order. The actual order that the DBMS uses will
probably be quite different, and unknown, because it will try to optimize
the access. This describes the process for SQL1, but SQL2 FROM clause joins are fairly
simple to reduce to the SQL1 FROM ... WHERE syntax. Altho this is a bit simplified
because it ignores subqueries, it is useful in most cases.
Logical order
- FROM clause tables are used to produce a cross join (product, cartesian product).
This is a table which consists of every possible combination of every row
in all tables participating in the join. If we were to join 6,000 row table with a 3,000 row
table the cross join table would have 6,000 x 3,000 = 18,000,000 rows!
Of course, the DBMS will optimize this and produce
a smaller table internally, but this large table is the logical result,
and it's easier to work believing that this giant cross join is used.
- The WHERE condition are tested against each row of the FROM cross join
result. Only those rows producing a TRUE result (neither FALSE nor UNKNOWN).
are kept in the result.
- If there is a GROUP BY grouping-columns clause, it is
applied to the previous (FROM, WHERE) result. All rows with the same
values in the grouping-columns are put together.
All rows having the same column value are reduced to a single row.
What is in the columns in the row is specified by the SELECT statement.
It must have specified one of the following:
- One or more of the grouping columns.
- One or more of the summary (aggregate, column) functions.
- A constant.
- An expression built from the above.
Note that the table resulting from GROUP BY is very different than
what came in from the join, both in rows and columns.
- A HAVING group-condition is similar to the the WHERE clause, except
it applies to the result of the GROUP BY clause. It can only refer to the
selected group columns, not fields in the records before the grouping.
It typically will make a comparison using summary functions.
- The ORDER BY clause is now applied to the previous result.
- Finally, the SELECT columns are chosen. Columns which are not specified in the
SELECT clause are not included in the result. These columns can be
column names or expressions. If the DISTINCT keyword is specified,
multiple rows with the save value will be condensed into only one row.
If there is a GROUP BY clause, the column