Documentation Source Text

Check-in [cf8be0cd71]
Login

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview
Comment:Enhanced SELECT documentation: State explicitly that the value of a bare column in an aggregate query with no input rows is arbitrary.
Downloads: Tarball | ZIP archive | SQL archive
Timelines: family | ancestors | descendants | both | trunk
Files: files | file ages | folders
SHA3-256: cf8be0cd71d24e5ec6fdecaf894ca417bf1a1bab1055b082ff4c5428493f8c22
User & Date: drh 2021-08-06 22:22:21
Context
2021-08-09
22:00
Begin documenting the fts5 on-disk format. (check-in: 018ff48a67 user: dan tags: trunk)
2021-08-06
22:22
Enhanced SELECT documentation: State explicitly that the value of a bare column in an aggregate query with no input rows is arbitrary. (check-in: cf8be0cd71 user: drh tags: trunk)
2021-08-01
21:40
For .expert command, indicate how to deal with schema using extension functions. (check-in: f1c94aeb9c user: larrybr tags: trunk)
Changes
Hide Diffs Unified Diffs Ignore Whitespace Patch

Changes to pages/lang_select.in.

75
76
77
78
79
80
81
82
83
84

85
86
87
88
89
90
91

<p>There are two types of simple SELECT statement - aggregate and 
non-aggregate queries. ^A simple SELECT statement is an aggregate query if
it contains either a GROUP BY clause or one or more aggregate functions
in the result-set. ^Otherwise, if a simple SELECT contains no aggregate
functions or a GROUP BY clause, it is a non-aggregate query.

<p><b>1. Determination of input data (FROM clause processing).</b>
<tcl>hd_fragment fromclause</tcl>
<tcl>hd_keywords {FROM clause}</tcl>


<p>The input data used by a simple SELECT query is a set of <i>N</i> rows 
each <i>M</i> columns wide.

<p>^(If the FROM clause is omitted from a simple SELECT statement, then the 
input data is implicitly a single row zero columns wide)^ (i.e. <i>N</i>=1 and
<i>M</i>=0).







<


>







75
76
77
78
79
80
81

82
83
84
85
86
87
88
89
90
91

<p>There are two types of simple SELECT statement - aggregate and 
non-aggregate queries. ^A simple SELECT statement is an aggregate query if
it contains either a GROUP BY clause or one or more aggregate functions
in the result-set. ^Otherwise, if a simple SELECT contains no aggregate
functions or a GROUP BY clause, it is a non-aggregate query.


<tcl>hd_fragment fromclause</tcl>
<tcl>hd_keywords {FROM clause}</tcl>
<h2>Determination of input data (FROM clause processing)</h2>

<p>The input data used by a simple SELECT query is a set of <i>N</i> rows 
each <i>M</i> columns wide.

<p>^(If the FROM clause is omitted from a simple SELECT statement, then the 
input data is implicitly a single row zero columns wide)^ (i.e. <i>N</i>=1 and
<i>M</i>=0).
173
174
175
176
177
178
179
180

181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220

221
222
223
224
225
226
227

<p>^(When more than two tables are joined together as part of a FROM clause,
the join operations are processed in order from left to right. In other 
words, the FROM clause (A join-op-1 B join-op-2 C) is computed as 
((A join-op-1 B) join-op-2 C).)^

<tcl>hd_fragment crossjoin {treats the CROSS JOIN operator specially}</tcl>
<p><b>Side note: Special handling of CROSS JOIN.</b>

^There is no difference between the "INNER JOIN", "JOIN" and "," join
operators. They are completely interchangeable in SQLite.
^(The "CROSS JOIN" join operator produces the same result as the 
"INNER JOIN", "JOIN" and "," operators)^, but is 
[CROSS JOIN|handled differently by the query optimizer]
in that it prevents the query optimizer from reordering
the tables in the join.  An application programmer can use the CROSS JOIN 
operator to directly influence the algorithm that is chosen to implement
the SELECT statement.  Avoid using CROSS JOIN except in specific situations 
where manual control of the query optimizer is desired.  Avoid using
CROSS JOIN early in the development of an application as doing so is
a <a href="http://c2.com/cgi/wiki?PrematureOptimization">premature
optimization</a>.  The special handling of CROSS JOIN is an SQLite-specific
feature and is not a part of standard SQL.
       

<tcl>hd_fragment whereclause</tcl>
<tcl>hd_keywords {WHERE clause}</tcl>
<p><b>2. WHERE clause filtering.</b>

<p>^(If a WHERE clause is specified, the WHERE expression is evaluated for 
each row in the input data as a [boolean expression]. Only rows for which the
WHERE clause expression evaluates to true are included from the dataset before
continuing.)^  Rows are excluded from the result if the WHERE clause
evaluates to either false or NULL.

<p>For a JOIN or INNER JOIN or CROSS JOIN, there is no difference between 
a constraint expression in the WHERE clause and one in the ON clause.  However,
for a LEFT JOIN or LEFT OUTER JOIN, the difference is very important.  
In a LEFT JOIN,
the extra NULL row for the right-hand table is added after ON clause processing
but before WHERE clause processing.  A constraint of the form "left.x=right.y"
in an ON clause will therefore allow through the added all-NULL rows of the
right table.  But if that same constraint is in the WHERE clause a NULL in
"right.y" will prevent the expression "left.x=right.y" from being true, and
thus exclude that row from the output.

<p><b>3. Generation of the set of result rows.</b>
<tcl>hd_fragment resultset</tcl>
<tcl>hd_keywords {result-set expressions} {GROUP BY}</tcl>


<p>Once the input data from the FROM clause has been filtered by the
WHERE clause expression (if any), the set of result rows for the simple 
SELECT are calculated. Exactly how this is done depends on whether the simple 
SELECT is an aggregate or non-aggregate query, and whether or not a GROUP
BY clause was specified.








|
>
|

















|


















<


>







173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218

219
220
221
222
223
224
225
226
227
228

<p>^(When more than two tables are joined together as part of a FROM clause,
the join operations are processed in order from left to right. In other 
words, the FROM clause (A join-op-1 B join-op-2 C) is computed as 
((A join-op-1 B) join-op-2 C).)^

<tcl>hd_fragment crossjoin {treats the CROSS JOIN operator specially}</tcl>
<h2>Special handling of CROSS JOIN.</h2>

<p>^There is no difference between the "INNER JOIN", "JOIN" and "," join
operators. They are completely interchangeable in SQLite.
^(The "CROSS JOIN" join operator produces the same result as the 
"INNER JOIN", "JOIN" and "," operators)^, but is 
[CROSS JOIN|handled differently by the query optimizer]
in that it prevents the query optimizer from reordering
the tables in the join.  An application programmer can use the CROSS JOIN 
operator to directly influence the algorithm that is chosen to implement
the SELECT statement.  Avoid using CROSS JOIN except in specific situations 
where manual control of the query optimizer is desired.  Avoid using
CROSS JOIN early in the development of an application as doing so is
a <a href="http://c2.com/cgi/wiki?PrematureOptimization">premature
optimization</a>.  The special handling of CROSS JOIN is an SQLite-specific
feature and is not a part of standard SQL.
       

<tcl>hd_fragment whereclause</tcl>
<tcl>hd_keywords {WHERE clause}</tcl>
<h2>WHERE clause filtering.</h2>

<p>^(If a WHERE clause is specified, the WHERE expression is evaluated for 
each row in the input data as a [boolean expression]. Only rows for which the
WHERE clause expression evaluates to true are included from the dataset before
continuing.)^  Rows are excluded from the result if the WHERE clause
evaluates to either false or NULL.

<p>For a JOIN or INNER JOIN or CROSS JOIN, there is no difference between 
a constraint expression in the WHERE clause and one in the ON clause.  However,
for a LEFT JOIN or LEFT OUTER JOIN, the difference is very important.  
In a LEFT JOIN,
the extra NULL row for the right-hand table is added after ON clause processing
but before WHERE clause processing.  A constraint of the form "left.x=right.y"
in an ON clause will therefore allow through the added all-NULL rows of the
right table.  But if that same constraint is in the WHERE clause a NULL in
"right.y" will prevent the expression "left.x=right.y" from being true, and
thus exclude that row from the output.


<tcl>hd_fragment resultset</tcl>
<tcl>hd_keywords {result-set expressions} {GROUP BY}</tcl>
<h2>Generation of the set of result rows</h2>

<p>Once the input data from the FROM clause has been filtered by the
WHERE clause expression (if any), the set of result rows for the simple 
SELECT are calculated. Exactly how this is done depends on whether the simple 
SELECT is an aggregate or non-aggregate query, and whether or not a GROUP
BY clause was specified.

294
295
296
297
298
299
300
301

302
303
304
305
306
307
308
309
    set of result rows. ^Subject to filtering associated with the DISTINCT
    keyword, the number of rows returned by an aggregate query with a GROUP
    BY clause is the same as the number of groups of rows produced by applying
    the GROUP BY and HAVING clauses to the filtered input dataset.
</ul>

<tcl>hd_fragment bareagg {bare aggregate terms}</tcl>
<p><b>Side note: Bare columns in an aggregate queries.</b>

The usual case is that all column names in an aggregate query are either
arguments to [aggfunc|aggregate functions] or else appear in the GROUP BY clause.
A result column which contains a column name that is not within an
aggregate function and that does not appear in the GROUP BY clause (if
one exists) is called a "bare" column.
Example:
<blockquote><pre>
SELECT a, b, sum(c) FROM tab1 GROUP BY a;







|
>
|







295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
    set of result rows. ^Subject to filtering associated with the DISTINCT
    keyword, the number of rows returned by an aggregate query with a GROUP
    BY clause is the same as the number of groups of rows produced by applying
    the GROUP BY and HAVING clauses to the filtered input dataset.
</ul>

<tcl>hd_fragment bareagg {bare aggregate terms}</tcl>
<h2>Bare columns in an aggregate query</h2>

<p>The usual case is that all column names in an aggregate query are either
arguments to [aggfunc|aggregate functions] or else appear in the GROUP BY clause.
A result column which contains a column name that is not within an
aggregate function and that does not appear in the GROUP BY clause (if
one exists) is called a "bare" column.
Example:
<blockquote><pre>
SELECT a, b, sum(c) FROM tab1 GROUP BY a;
333
334
335
336
337
338
339



340











341
342

343
344
345
346
347
348
349
the "b" column in the input row that has the largest "c" value.
There is still an ambiguity if two or more of the input rows have the
same minimum or maximum value or if the query contains more than one
min() and/or max() aggregate function. Only the built-in [min_agg|min()] 
and [max_agg|max()] functions work this way.
</p>




<p><b>4. Removal of duplicate rows (DISTINCT processing).</b>











<tcl>hd_fragment distinct</tcl>
<tcl>hd_keywords {DISTINCT}</tcl>


<p>^One of the ALL or DISTINCT keywords may follow the SELECT keyword in a 
simple SELECT statement. ^If the simple SELECT is a SELECT ALL, then the
entire set of result rows are returned by the SELECT. ^If neither ALL or
DISTINCT are present, then the behavior is as if ALL were specified. 
^If the simple SELECT is a SELECT DISTINCT, then duplicate rows are removed
from the set of result rows before it is returned. ^For the purposes of







>
>
>
|
>
>
>
>
>
>
>
>
>
>
>


>







335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
the "b" column in the input row that has the largest "c" value.
There is still an ambiguity if two or more of the input rows have the
same minimum or maximum value or if the query contains more than one
min() and/or max() aggregate function. Only the built-in [min_agg|min()] 
and [max_agg|max()] functions work this way.
</p>

<p>If bare columns appear in an aggregate query that lacks a GROUP BY clause,
and the number of input rows is zero, then the values of the bare columns
are arbitrary.  For example, in this query:

<blockquote><pre>
SELECT count(*), b FROM tab1;
</pre></blockquote>

<p>If the tab1 table contains no rows (of count(*) evaluates to 0) then
the bare column "b" will have an arbitrary and meaningless value.</p>

<p>Most other SQL database engines disallow bare columns.  If you include
a bare column in a query, other database engines will usually raise an error.
The ability to include bare columns in a query is an SQLite-specific extension.</p>

<tcl>hd_fragment distinct</tcl>
<tcl>hd_keywords {DISTINCT}</tcl>
<h2>Removal of duplicate rows (DISTINCT processing)</h2>

<p>^One of the ALL or DISTINCT keywords may follow the SELECT keyword in a 
simple SELECT statement. ^If the simple SELECT is a SELECT ALL, then the
entire set of result rows are returned by the SELECT. ^If neither ALL or
DISTINCT are present, then the behavior is as if ALL were specified. 
^If the simple SELECT is a SELECT DISTINCT, then duplicate rows are removed
from the set of result rows before it is returned. ^For the purposes of