Possible Wrong Results of the Group By Clause

(1) By Jinsheng Ba (bajinsheng) on 2022-05-02 02:32:49 [link] [source]

create table t2 (c0 INTEGER, c1 INTEGER);
insert into t2 values(1,3);
insert into t2 values(1,4);
insert into t2 values(1,5);
insert into t2 values(2,4);
insert into t2 values(2,5);

select * from t2 group by false;




Please see this, the return result is only one row: 1|3. Should not it returns two rows or zero row?

(2) By Keith Medcalf (kmedcalf) on 2022-05-02 04:51:08 in reply to 1 [link] [source]

The result is correct.

You have grouped by a constant. This means that all rows are a member of the one group -- and one row will be returned for this entire group.

You requested that a full-set of bare columns (*) be returned from some arbitrary row in the group. The arbitrary row chosen (not actually arbitrary, but it is magical enough to those who do not understand the technology) happens to be the first one in the group.

The projection result (a single row) is returned which has been arbitrarily determined to be the first row in the only group.

(3) By Jinsheng Ba (bajinsheng) on 2022-05-02 05:16:36 in reply to 2 [link] [source]

It looks so magic....

Thank you!

(4) By Aask (AAsk1902) on 2022-05-02 07:38:32 in reply to 2 [link] [source]

I do NOT understand the result.

When I try with another constant, namely, zero I get:

sqlite> select * from t2 group by 0;
Parse error: 1st GROUP BY term out of range - should be between 1 and 2

That suggests to me that what follows GROUP BY , when not a column name, is expected to be the ordinal position of column in the table.

The following results support this conclusion.

sqlite> select * from t2 group by 1; -- unique values in column 1
c0  c1
--  --
1   3
2   4

And

sqlite> select * from t2 group by 2; -- unique values in column 2
c0  c1
--  --
1   3
1   4
1   5

In the latter two results, the value of the column NOT being grouped by happens to be the first value ... probably because aggregation requires SUM(colName).

Given that false or true are not defined and are NOT ordinal positions of a column number, why does the SQL statement NOT fail?

(5) By Gunter Hick (gunter_hick) on 2022-05-02 08:19:20 in reply to 4 [link] [source]

RTM.

https://sqlite.org/lang_select.html

"Each ORDER BY expression is processed as follows:

If the ORDER BY expression is a constant integer K then the expression is considered an alias for the K-th column of the result set (columns are numbered from left to right starting with 1)..."

0 is less than 1. You would get the same error for every other constant integer except 1 and 2.

In https://sqlite.org/lang_expr.html, expand the literal-value railroad diagram to reveal that TRUE and FALSE are indeed literal values. While TRUE and FALSE are represented by the numerical values 1 and 0, they are NOT "constant integers" and therefore cannot refer to a column of the select list.

BTW: Using the wildcard * as a column list is discouraged except for "ad hoc queries" because a simple change of schema will change their meaning.

(6) By Aask (AAsk1902) on 2022-05-02 08:37:25 in reply to 5 [link] [source]

You asserted:

columns are numbered from left to right starting with 1

While TRUE and FALSE are represented by the numerical values 1 and 0

I agree with both.

So why does

select * from t2 group by false; -- Remembering that false = 0

NOT fail

and

select * from t2 group by 0;

fail? There is never a column 0

Also, why do

select * from t2 group by true; -- Remembering that true = 1

select * from t2 group by false; -- Remembering that false = 0

return the same result?

(7) By Harald Hanche-Olsen (hanche) on 2022-05-02 09:07:17 in reply to 6 [link] [source]

Because 0 is a literal integer constant in the syntactic sense. But false is an identifier (or keyword?) whose value happens to be zero, and that is a different thing.

(9) By Aask (AAsk1902) on 2022-05-02 09:29:30 in reply to 7 [link] [source]

To me it seems that GROUP BY can be followed one of the following:

A Column Name
OR the ordinal position of a column
OR a condition (binary, true or false)

with

select * from t2 group by false; -- false = the result of a condition

Nothing to do with a value being specified literally or by reference to a keyword or identifier.

(10) By Gunter Hick (gunter_hick) on 2022-05-02 09:41:29 in reply to 9 [link] [source]

READ THE MANUAL

(12) By Aask (AAsk1902) on 2022-05-02 09:54:25 in reply to 10 [link] [source]

As should be evident, a manual purports to describe what the code should or is meant to be doing. There is a difference between what is stated and what actually happens.

(13) By Harald Hanche-Olsen (hanche) on 2022-05-02 10:00:34 in reply to 12 [source]

I does indeed seem that the relevant page does not mention the use of literal integers to designate a GROUP BY column. It does mention this usage for ORDER BY, but that is not the same thing. Without descending into a flame war, I think we may agree that this is a documentation shortcoming that should be fixed.

(14) By Gunter Hick (gunter_hick) on 2022-05-02 10:50:22 in reply to 13 [link] [source]

GROUP BY and ORDER BY share the interpretation of literal constant integers und column aliases. Oridinal position notation is a common feature of SQL implementations and not particular to SQLite.

(24) By Harald Hanche-Olsen (hanche) on 2022-05-02 15:31:30 in reply to 14 [link] [source]

Yes, but that does not address the question of whether the SQLite documentation is adequate.

However, on a closer look, I found the relevant information in the section named Generation of the set of result rows:

If the SELECT statement is an aggregate query with a GROUP BY clause, then each of the expressions specified as part of the GROUP BY clause is evaluated for each row of the dataset according to the processing rules stated below for ORDER BY expressions. [my bold]

So I conclude that the docs don't need to be modified after all.

(15) By Aask (AAsk1902) on 2022-05-02 11:53:44 in reply to 13 [link] [source]

I atarted off stating that I do NOT understand the result. hoping for some explanation that might put the use case for this into context.

With other SQL dialects, you cannot SELECT a column unless it is included in the GROUP BY list either by name or ordinal positions of columns; the SELECTion is by recourse to aggregation e.g. SUM or AVG or MIN or MAX etc or DISTINCT.

I am aware (as is often pointed out in this forum) that SQLite (genius that it is) is different.

I still do not understand where I could use this construction of SQL.

sqlite> select count(*),* from t2 group by c0 is not null;
count(*)  c0  c1
--------  --  --
5         1   3
sqlite> select count(*),* from t2 group by false;
count(*)  c0  c1
--------  --  --
5         1   3
sqlite> select count(*),* from t2 group by (select 42); -- 42 is not an ordinal position
count(*)  c0  c1
--------  --  --
5         1   3
sqlite> select count(*),* from t2 group by (select 'Summer');
count(*)  c0  c1
--------  --  --
5         1   3
sqlite>

Like the OP, I am still wondering why

select * from t2 group by false;

this SQL statement works
it does not return two rows (because column c0 has 2 distinct values) or zero row (because false is specified).

because, as far as I can see, the result as it stands does not have any unambiguous meaning.

(16) By Gunter Hick (gunter_hick) on 2022-05-02 12:50:31 in reply to 15 [link] [source]

GROUP BY divides the input into groups according to the tuples specified.

GROUP BY 1 divides the input into groups depending on the value of the first column in the SELECT list.

GROUP BY c0 divides the input into groups depending on the value of the c0 column (in the case of there being a column named c0 in the input table or a column alias named c0 in the SELECT list)

GROUP BY <constant expression> divides the input into exactly ONE group.

Columns mentioned in GROUP BY return the value of the group.
Columns with an aggregate return the aggregated values across the group
Any other columns return the value from an arbitrarily selected row in their group, except when there is exactly one aggregate and that is MIN or MAX; in this case, their values are taken from an arbitraily selected row exhibiting the MIN or MAX value.

So "SELECT c0,c1 FROM t2 GROUP BY FALSE" creates 1 group and returns a copy of any one row. Just the same as the first two columns of  "SELECT c0,c1,FALSE FROM t2 GROUP BY 3".

If you want a result row for each distinct value of c0, you need to "SELECT c0,c1 FROM t2 GROUP BY 1" which is the same as "SELECT c0,c1 FROM t2 GROUP BY c0"

(17.1) By Aask (AAsk1902) on 2022-05-02 13:07:30 edited from 17.0 in reply to 16 [link] [source]

Thank you.

So "SELECT c0,c1 FROM t2 GROUP BY FALSE" creates 1 group and returns a copy of any one row.

The problem I have is

So "SELECT c0,c1 FROM t2 GROUP BY FALSE" creates 1 group and returns a copy of any one row.

That means that any Count(*) or SUM(col) AVG(col) etc will be consistent but a literal column selected will (might) be different given a different sequence of rows of the same table. Allowing for this, where could such a result be used predictively?

(18) By Ryan Smith (cuz) on 2022-05-02 13:16:19 in reply to 17.1 [link] [source]

That means that any Count(*) or SUM(col) AVG(col) etc will be consistent but a literal column selected will (might) be different. Allowing for this, where could such a result be used predictively?

It can't, and it shouldn't - not in any SQL.

The fact that SQLite even allows this is sometimes viewed as a mistake and we've been lobbying for a strict mode in which this is not possible. SQLite does bring a nifty feature to the table (due to allowing this) in which it will pick non-aggregate values (not enclosed in an aggregate function and not part of the group-by clause) from the same row as the aggregate MAX() or MIN() happens to be from - which is a bit laborious to do in normal SQL. It's utility depends on the case where there are no two rows both satisfying the MAX() equivalence, so unless you fully control the data shape, you can never be sure - hence, best avoided in the general case.

Towards a previous question:

Grouping by a constant, like "False" will mean a single grouping identity and so always produce a single row. To explain more clearly, all these following queries are equivalent and have the exact same result - but is very much valid and allowed:

SELECT * FROM t GROUP BY False;
SELECT * FROM t GROUP BY 'TrainSmash';
SELECT * FROM t GROUP BY Date('now');
SELECT * FROM t GROUP BY (5 IS NULL);
SELECT * FROM t GROUP BY (5 IS NOT NULL);

(22) By Aask (AAsk1902) on 2022-05-02 13:52:15 in reply to 18 [link] [source]

It can't, and it shouldn't - not in any SQL.

Consider the following:

sqlite> drop table if exists t2;
sqlite>
sqlite> create table t2 (c0 INTEGER, c1 INTEGER);
sqlite> insert into t2 values(1,3);
sqlite> insert into t2 values(1,4);
sqlite> insert into t2 values(1,5);
sqlite> insert into t2 values(2,4);
sqlite> insert into t2 values(2,5);  -- this row to become row 1 in the second pass
sqlite>
sqlite> select avg(c1),* from t2 group by false;
avg(c1)  c0  c1
-------  --  --
4.2      1   3
sqlite>
sqlite> drop table if exists t2;
sqlite>
sqlite> create table t2 (c0 INTEGER, c1 INTEGER);
sqlite> insert into t2 values(2,5); -- this row is now row 1
sqlite> insert into t2 values(1,3);
sqlite> insert into t2 values(1,4);
sqlite> insert into t2 values(1,5);
sqlite> insert into t2 values(2,4);
sqlite>
sqlite> select avg(c1),* from t2 group by false;
avg(c1)  c0  c1
-------  --  --
4.2      2   5
sqlite>

The values of c0 and c1 are different and avg(c1) is identical: the change is the sequencing of the rows in the table Not the overall content of the table.

With the first SQL:

avg(c1)  c0  c1
-------  --  --
4.2      1   3

With the second SQL:

avg(c1)  c0  c1
-------  --  --
4.2      2   5

And the avg is to do with NEITHER c0=1 and c1 = 3 NOR with c0=2 and c1 = 5 as portrayed in the respective SQL results BUT with c0, c1 in their entirety.

(23) By Gunter Hick (gunter_hick) on 2022-05-02 14:10:27 in reply to 22 [link] [source]

Both results are equally valid. Note that (4.2,2,3) would be an invalid result, because (2,3) does not exist in t2.

Here are some more fun queries (all with correct results):

asql> select avg(c1),max(c0),c1 from t2 group by false;
avg(c1)  max(c0)  c1
-------  -------  ----------
4.2      2        4
asql> select avg(c1),c0,max(c1) from t2 group by false;
avg(c1)  c0          max(c1)
-------  ----------  -------
4.2      1           5
asql> select avg(c1),max(c0),min(c1) from t2 group by false;
avg(c1)  max(c0)  min(c1)
-------  -------  -------
4.2      2        3

I will let you in on a high level secret: The "arbitrary" row depends on the visitation order. DO NOT RELY ON THE VISITATION ORDER. It is hard to predict except for very simple queries and may change with the SQLite version, the shape of your data, the selected query plan - especially in multi-tabled joins - and even running ANALYZE.

(25.2) By Aask (AAsk1902) on 2022-05-02 17:21:47 edited from 25.1 in reply to 23 [link] [source]

DO NOT RELY ON THE VISITATION ORDER.

Sound advice.

Final words:

I do NOT think I will be using this particular SQL construction.
As SQLite (forum) claims adherence to published SQL standards, I am just wondering whether this complies with those standards. As far as I see A GROUP BY expression can only contain column names from the select list, but not those used only as arguments for vector aggregates. GROUP BY should IGNORE null values - this construction does not.

sqlite> drop table t2;
sqlite> create table t2 (c0 INTEGER, c1 INTEGER);
sqlite> insert into t2 values(1,null);
sqlite> insert into t2 values(1,4);
sqlite> insert into t2 values(1,5);
sqlite> insert into t2 values(2,4);
sqlite> insert into t2 values(2,5);
sqlite>
sqlite> select * from t2 group by false;
c0  c1
--  --
1
sqlite>

(26) By Ryan Smith (cuz) on 2022-05-02 17:22:05 in reply to 25.0 [link] [source]

I do NOT think I will be using this particular SQL construction.

Good, you shouldn't.

As SQLite (forum) claims adherence to published SQL standards, I am just wondering whether this complies with those standards.

Wow, we never claimed strict adherence to the published SQL standards, In fact this construct goes against the SQL standards - I thought that was pretty clear from previous replies.

Some more notes just in case anyone is confused:

SQLite does not adhere strictly to the SQL standards, or claims to do so.
MySQL does not adhere strictly to the SQL standards, or claims to do so.
MSSQL does not adhere strictly to the SQL standards, or claims to do so.
(pretty much any DB engine you can think of) does not adhere strictly to the SQL standards, or claims to do so.

Most of them however follow pretty close, with a few exceptions/specializations (Postgres probably best of all, but still not perfectly). SQLite's handling of GROUP BY is one such specialization.

If I had my philosophical hat on, I'd say:
The SQL standard does not adhere strictly to the SQL standards.

Quoting the well-known philosopher, Captain Blackbeard:
"The SQL Standard's more suggestions than actual rules..."

(27) By Keith Medcalf (kmedcalf) on 2022-05-02 18:21:38 in reply to 26 [link] [source]

Wow, we never claimed strict adherence to the published SQL standards, In fact this construct goes against the SQL standards - I thought that was pretty clear from previous replies.

Actually, once upon a time in a land far away, this was standard SQL. Additional comparison operators were also supported that use a preceding or trailing * (or both) to indicate outer join conditions.

However, some people had difficulty in understanding what they were asking for.

It was easier to make the syntax more comlicated and disallow the usage which confused the young'uns in order to get them to STFU because, apparently, education failed to eradicate ignorance. The young'uns had 'no use' and 'couldn't understand why anyone else might' for the syntax so they decided to get rid of it entirely, by committee.

Hence the origin of the <LHSTable> <join type> JOIN <RHSTable> ON <join condition> syntax and the deprecation of allowing bare columns in a group-by projection.

(35) By Ryan Smith (cuz) on 2022-05-02 23:21:51 in reply to 27 [link] [source]

Actually, once upon a time in a land far away, this was standard SQL.

Every day, something learned.
This time you speak of truly sounds magical.

(36) By Keith Medcalf (kmedcalf) on 2022-05-03 00:06:06 in reply to 35 [link] [source]

It was. The 90's (1990's), when the first bitty-boxen SQL engine appeared. Of course, IBM DB2 worked this way for the prior two and a half decades.

(37) By Gunter Hick (gunter_hick) on 2022-05-03 09:33:46 in reply to 25.2 [link] [source]

These are the constraints of GROUP BY expressions as stated.

"The expressions in the GROUP BY clause do not have to be expressions that appear in the result. The expressions in a GROUP BY clause may not be aggregate expressions."

It is expressly NOT limited to the SELECT list. And there is nothing prohibited about SELECTing both c1 and SUM(c1) in a GROUP BY statement.

asql> select c0, c1, sum(c1) from t2 group by false;
c0|c1|sum(c1)
1|NULL|18

It is perfectly OK for SQLite to choose the row containing the NULL value as the "arbitrary row".

Where did you get the notion the GROUP BY should ignore NULL values?

"For the purposes of grouping rows, NULL values are considered equal."

Where in "GROUP BY FALSE" do you percieve a NULL value? Even GROUP BY NULL still puts all rows in a single group and correctly changes nothing in the result.

(38) By Aask (AAsk1902) on 2022-05-03 12:06:52 in reply to 37 [link] [source]

It is expressly NOT limited to the SELECT list.

I've always constructed SQL statements (Oracle, MSSQLServer) with the restriction that GROUP BY contains columns from the SELECT list. Google search provides some peripheral support for this; two links follow: In standard SQL, a query that includes a GROUP BY clause cannot refer to non-aggregated columns in the select list that are not named in the GROUP BY clause. and A column used in an expression of the SELECT clause must be in the GROUP BY clause.

I realised that SQLite does not abide and I wanted to understand this adaptation in case it proved to be an efficient shortcut for things that I do routinely but (as you can see/know) this got out of hand and, without an understanding of why it works the way it does, I will simply avoid it not least because it is not portable i.e. other providers have their own nuances in similar adaptations.

Where did you get the notion the GROUP BY should ignore NULL values?

GROUP BY does have/return null values for literal columns but not for non-literal columns. But, since SQLite until 3.37 is essentially type-less under the covers, every column can be considered a literal column. So this is not an big issue.

"arbitrary row".

That is what I see as the black hole that blights my comprehension.

Still, all is not lost; I am not pursuing this aspect of SQLite any further.

(20) By Gunter Hick (gunter_hick) on 2022-05-02 13:31:46 in reply to 17.1 [link] [source]

Try this:

CREATE TABLE employees (id INTEGER PRIMARY KEY, name TEXT, department TEXT, salary INTEGER);

Who is paid the most?

SELECT name,salary FROM employees WHERE salary = (SELECT MAX(salary) FROM employees);

which naively requires two full table scans (less with appropriate indices); one to determine MAX(salary) and one to look up the guy getting paid that much

or

SELECT name,MAX(salary) FROM employees;

which only needs to go through the table once BUT relies on no two employees having identical salaries (in which case it will report any one of them)

Who is paid the most in each department?

SELECT department, name, MAX(salary) FROM employees GROUP BY 1;

will give you the name of (one of) the best paid employee by department.

How to do this without non-aggregate values is left as an excercise.

(21) By TripeHound on 2022-05-02 13:51:58 in reply to 17.1 [link] [source]

Allowing for this, where could such a result be used predictively?

As Ryan says, it cannot.

Most languages allow constructs that have little or no utility... a trivial example (in no particular language, but would work in many) is:

y = 2*x + z - z + z - z + z - z

Assuming (a) the +z-z... aren't optimised out, and (b) the first +z doesn't cause any overflow behaviour, then repeatedly adding an subtracting the same value is entirely without utility, but no language I know prevents you from doing so.

Likewise, ... GROUP BY FALSE (or, to pick another of Ryan's examples, GROUP BY 'TrainCrash') have no particular utility (that I know of), but that – in itself – is not a reason to disallow such constructs.

Since GROUP BY allows the use of an expression for "the thing to group by" (and I can imagine uses for this), and the fact that FALSE is what you might call a "degenerate expression", it would (probably) be very hard to allow (useful) expressions but disallow things like FALSE.

(28) By Keith Medcalf (kmedcalf) on 2022-05-02 20:10:13 in reply to 17.1 [link] [source]

Allowing for this, where could such a result be used predictively?

What does this have to do with anything? Just because you (one instance of million upon millions of SQLite3 users) see no utility does not mean that one of the other millions upon millions does not see utility.

Why do you think that your limited vision should require everyone else to wear blinders?

(29) By Aask (AAsk1902) on 2022-05-02 20:33:50 in reply to 28 [link] [source]

Allowing for this, where could such a result be used predictively?

This question remains unanswered. Put simply, your response does not add any clarification.

Additionally,

GROUP BY should IGNORE null values - this construction does not.

I do not understand how a an SQL statement with GROUP BY can return a column with a null value.

one instance of million upon millions of SQLite3 users

Just to be pedantic, its at least TWO, the other being the OP who asked Should not it returns two rows or zero row?

Why do you think that your limited vision should require everyone else to wear blinders?

I did start off with a simple statement: I do NOT understand the result. Twenty-eight responses later, I still do NOT understand the result.

In case you have any doubt, I state explicitly: I have no desire for descending into a flame war

(30) By Keith Medcalf (kmedcalf) on 2022-05-02 20:52:20 in reply to 29 [link] [source]

I do not understand how a an SQL statement with GROUP BY can return a column with a null value.

Because that is what was asked for.

GROUP BY should IGNORE null values - this construction does not.

This is untrue. null is a value. I do not know wherefrom you got this rather strange idea.

sqlite> with t(a,b) as (values (1,1),(1,2),(null,1),(null,2),(3,4),(3,5))
   ...> select a, sum(b) from t group by a;
┌──────┬────────┐
│  a   │ sum(b) │
├──────┼────────┤
│ NULL │ 3      │
│ 1    │ 3      │
│ 3    │ 9      │
└──────┴────────┘
VM-steps: 172
Run Time: real 0.014 user 0.000000 sys 0.000000
sqlite>

(31) By Keith Medcalf (kmedcalf) on 2022-05-02 20:56:11 in reply to 30 [link] [source]

And this returns even more null values:

sqlite> with t(a,b) as (values (1,1),(1,2),(null,1),(null,2),(3,4),(3,5),(4,NULL))
   ...> select a, sum(b) from t group by a;
┌──────┬────────┐
│  a   │ sum(b) │
├──────┼────────┤
│ NULL │ 3      │
│ 1    │ 3      │
│ 3    │ 9      │
│ 4    │ NULL   │
└──────┴────────┘

(32) By Keith Medcalf (kmedcalf) on 2022-05-02 21:02:55 in reply to 31 [link] [source]

And that can bew extended thusly:

sqlite> with t(a,b) as (values (1,1),(1,2),(null,1),(null,2),(3,4),(3,5),(4,NULL))
   ...> select a, sum(b), b, firstnotnull(b), lastnotnull(b) from t group by a;
┌──────┬────────┬──────┬─────────────────┬────────────────┐
│  a   │ sum(b) │  b   │ firstnotnull(b) │ lastnotnull(b) │
├──────┼────────┼──────┼─────────────────┼────────────────┤
│ NULL │ 3      │ 1    │ 1.0             │ 2.0            │
│ 1    │ 3      │ 1    │ 1.0             │ 2.0            │
│ 3    │ 9      │ 4    │ 4.0             │ 5.0            │
│ 4    │ NULL   │ NULL │ NULL            │ NULL           │
└──────┴────────┴──────┴─────────────────┴────────────────┘

firstnotnull and lastnotnull are aggregate functions that do what they say as the group is traversed. (They are custom aggregates, not part of the standard distribution).

(33) By Keith Medcalf (kmedcalf) on 2022-05-02 21:05:16 in reply to 32 [link] [source]

Even this:

sqlite> with t(a,b) as (values (1,1),(1,2),(null,null),(null,1),(null,2),(3,4),(3,5),(4,NULL))
   ...> select a, sum(b), b, firstnotnull(b), lastnotnull(b) from t group by a;
┌──────┬────────┬──────┬─────────────────┬────────────────┐
│  a   │ sum(b) │  b   │ firstnotnull(b) │ lastnotnull(b) │
├──────┼────────┼──────┼─────────────────┼────────────────┤
│ NULL │ 3      │ NULL │ 1.0             │ 2.0            │
│ 1    │ 3      │ 1    │ 1.0             │ 2.0            │
│ 3    │ 9      │ 4    │ 4.0             │ 5.0            │
│ 4    │ NULL   │ NULL │ NULL            │ NULL           │
└──────┴────────┴──────┴─────────────────┴────────────────┘

(34) By Aask (AAsk1902) on 2022-05-02 21:17:31 in reply to 32 [link] [source]

Keith, point taken or if you prefer you made your point.

I'll take this as SQLite's handling of GROUP BY is one such specialization.

To clarify further, an SQL statement involving GROUP BY CAN include column names in its SELECT that are NOT in its GROUP BY AND SO (as you demonstrate) without aggregation (Avg, Min, Max etc. or DISTINCT) thereby (potentially) introducing null values in its result.

In your example:

select a, sum(b), b, firstnotnull(b), lastnotnull(b) from t group by a;

SELECT includes b which is not in GROUP BY, and includes a without aggregation.

(40) By Vadim Goncharov (nuclight) on 2022-05-08 16:23:33 in reply to 30 [link] [source]

This is untrue. null is a value. I do not know wherefrom you got this rather strange idea.

Just to be pedantic :) NULL is not a value - in relational algebra it's an absence of value in tuple, just mapped this "like a special value"-way in SQL and it's implementations.

(41) By Keith Medcalf (kmedcalf) on 2022-05-08 21:23:46 in reply to 40 [link] [source]

NULL is a value that indicates that the actual value is unknown. Were it not a value in its own right (just as 1 is a value) then it would not exist.

(42) By Harald Hanche-Olsen (hanche) on 2022-05-09 07:55:15 in reply to 41 [link] [source]

Yesterday, upon the stair, I met a man who wasn't there …

But seriously,

NULL is a value that indicates that the actual value is unknown.

I sometimes use it to indicate that no value can be meaningfully assigned to this slot. Which is slightly different from a merely unknown, but definite, value.

(43) By Vadim Goncharov (nuclight) on 2022-05-09 12:52:30 in reply to 41 [link] [source]

Well, "is a value" is implementation detail of SQL and high-level dynamic scripting languages (fun with JS's both null and undefined, BTW), when a variable, or table cell, physically must to hold something.

But if we take origin theory (relational algebra), and rewrite the followin table...

col1	col2	col3
1	2	3
4	NULL	6

...as tuples in somewhat Lisp-like parlance, we'll get:

(
(row (col1 1) (col2 2) (col3 3))
(row (col1 4) (col3 6))
)

that is, nothing in that place. Or in JSON:

[
  {
     col1: 1,
     col2: 2,
     col3: 3
  },
  {
     col1: 4,
     col3: 6
  }
]

It's just the need that table is actually array of arrays to have it as [ [1, 2, 3], [4, null, 6] ]

BTW, in such scripting languages another problem arises - how to differentiate above version from e.g.:

  {
     col1: 4,
     col2: null,
     col3: 6
  }

but this is unexpressionable in pure standard SQL so out of scope here :)

There are also lengthy philosophical discussions on NULL and SQL subjects on interesting site https://wiki.c2.com/

(19) By anonymous on 2022-05-02 13:19:31 in reply to 15 [link] [source]

All the examples given do not use a "literal integer constant" for GROUP BY. E.g. "false" is a literal constant, but not an integer literals (https://sqlite.org/quirks.html states that true and false are BOOLEAN literals); "(select 42)" is not a "literal" (but an expression that evaluates to the same value for each row: 42).

Hence, in none of the cases the value is interpreted as "an alias for the K-th column".

Since in the 4 given examples, per query, the expression evaluates to the same value for each row, all rows in the table belong to the same group (defined by that single value). This explains why there is only one result row in all the examples.

I could not find what constitutes a literal integer constant in the Sqlite documentation. The railroad diagrams on https://sqlite.org/syntax/literal-value.html does not detail numeric-literal any further.

From experiments I found that "0x1" is considered a literal integer constant. The literals "1.0", "1.", and "1e0" are NOT integers (but real numbers), nor is "'1'", "+'1'" (text), nor "CAST('1' AS INT)" (integer expression, not a literal). Hence, they do not select a specific column (but are interpreted as a GROUP BY expression).

However, I was a surprised that "- - + - - + - - 1" is considered a literal integer constant (sqlite_version 3.34): select c0, count() from t2 group by - - + - - + - - 1; 1|3 2|2 Appearently, I can prefix a literal number with an arbitrary number of unary minus-signs and plus-signs. It is still a literal.

One practical use case for GROUP BY a constant value: I sometimes use grouping-by-a-constants-value if I want a records count, but no result rows for empty tables. CREATE TABLE t3 (c0 INTEGER, c1 INTEGER); select count() from t3 group by 'ALL'; (no results) whereas leaving out GROUP BY always gives a result row select count() from t3; 0

(11) By Harald Hanche-Olsen (hanche) on 2022-05-02 09:52:04 in reply to 9 [link] [source]

GROIUP BY can be followed by any expression, including a literal integer.

The only magic is that literal integers are treated differently than other expressions. But there is no special treatment of “conditions”, whatever they may be. Note that sqlite does not have a boolean data type; it uses the integer 0 and and 1 instead.

Right now, I can't find the part of the documentation where using ordinal column numbers is mentioned. But it's a standard SQL construct, I think. It is also one that leads to much confusion. I think the language might have been better without it.

(8) By Gunter Hick (gunter_hick) on 2022-05-02 09:19:37 in reply to 6 [link] [source]

"false" is not an "integer constant".

Grouping by a constant expression (e.g. false, 0.0 or even 'false') puts all rows into the same group. Any non-aggregate columns are take from the same, arbitrarily chosen row that is a member of the current group.

Both your selevt statements happen to choose the same row to return values from.

(39) By jose isaias cabrera (jicman) on 2022-05-03 13:17:07 in reply to 8 [link] [source]

WOW! The knowledge and wisdom on this forum is amazing. Thanks all for your wonderful contribution to teaching SQL in here. As we say in Spanish, "Muchas gracias."

josé