Invalid group by usage?

(1) By zhangyushao (zhangysh1995) on 2020-08-06 04:17:43 [link] [source]

In short, the following query shouldn't be executable and it produces not meaningful result:

sqlite> SELECT COUNT() AS field1, `col_char(20)_signed` FROM table_10_undef_undef ;
10|you

How to reproduce:

create table table_10_undef_undef (
`pk` int primary key,
`col_bigint_signed` bigint ,
`col_float_signed` float ,
`col_double_signed` double ,
`col_char(20)_signed` char(20) ,
`col_varchar(20)_signed` varchar(20) ,
`col_tinyint_signed` tinyint ,
`col_smallint_signed` smallint 
);

insert into table_10_undef_undef values (0,-9.183,12.991,-0,"you",null,'a','c'),(1,-9553,1,-9.183,'w',null,"back","not"),(2,1,-1,0,null,'x','q',null),(3,-8,-0,-0,null,"is",'b',null),(4,18,0,12.991,'r','w','p',null),(5,117,12.991,1.009,'c',null,'v',"will"),(6,18.0266,-1,-9.183,"can","okay",null,"i"),(7,12.991,1,-121,'w',null,null,'j'),(8,-9.183,100,-9.183,"right",'g','o','a'),(9,100,null,12.991,'s',null,'r',"want");

SELECT ~ ( COUNT(  ) ) AS field1, `col_char(20)_signed` FROM table_10_undef_undef ;

(2) By Gunter Hick (gunter_hick) on 2020-08-06 07:34:28 in reply to 1 [link] [source]

This is documented. SQLite allows nonaggregated columns in aggregate queries.

If there is exactly one aggregate function in the query, and that function is MIN() or MAX(), then the column will be taken from one of the rows that exhibits the MIN or MAX value (if a GROUP BY is added, this applies to the groups' MIN or MAX value respectively).

SELECT max(a),b FROM T;

is equivalent to

SELECT a,b FROM T where a = (SELECT max(a) FROM T) LIMIT 1;

but requires only 1 traversal of the table instead of an average 1.5

(3) By zhangyushao (zhangysh1995) on 2020-08-06 07:37:42 in reply to 2 [link] [source]

There is no MIN or MAX in the query, it uses COUNT. Is this also a documented behavior?

(4) By Gunter Hick (gunter_hick) on 2020-08-06 08:12:38 in reply to 3 [link] [source]

Yes. It will return the value from one of the rows.

(7) By zhangyushao (zhangysh1995) on 2020-08-06 08:51:07 in reply to 4 [link] [source]

From the rows in the table, I don't know how -11 could select a row.

sqlite> .mode column
sqlite> .headers on
sqlite> select * from table_10_undef_undef;
pk  col_bigint_signed  col_float_signed  col_double_signed  col_char(20)_signed  col_varchar(20)_signed  col_tinyint_signed  col_smallint_signed
--  -----------------  ----------------  -----------------  -------------------  ----------------------  ------------------  -------------------
0   -9.183             12.991            0.0                you                                          a                   c
1   -9553              1.0               -9.183             w                                            back                not
2   1                  -1.0              0.0                                     x                       q
3   -8                 0.0               0.0                                     is                      b
4   18                 0.0               12.991             r                    w                       p
5   117                12.991            1.009              c                                            v                   will
6   18.0266            -1.0              -9.183             can                  okay                                        i
7   12.991             1.0               -121.0             w                                                                j
8   -9.183             100.0             -9.183             right                g                       o                   a
9   100                                  12.991             s                                            r                   want

(12) By Gunter Hick (gunter_hick) on 2020-08-06 12:34:59 in reply to 7 [source]

There are 10 rows, so the result of count() is 10.
The col_char(20)_signed field of the first row is 'you'.

So a result tuple of (10, 'you') is perfectly plausible and just as correct as returning any other value from the aforementioned column.

I have no idea where your value of -11 comes from.

BTW: SQLite does not care about the declared dimension of a text or blob field, as only the actual content gets stored. You can also store any type of value in any field independent of its declared type.

(13) By Keith Medcalf (kmedcalf) on 2020-08-06 12:52:21 in reply to 12 [link] [source]

-11 is the complement of 10 ...

sqlite> select ~10;
┌─────┐
│ ~10 │
├─────┤
│ -11 │
└─────┘

Presumably this is a reference to the ~ ( count(*) ) that was scattershot.

(5) By Keith Medcalf (kmedcalf) on 2020-08-06 08:20:53 in reply to 3 [link] [source]

I would expect that the value would come from one of the rows forming the group. In the specific case of min/max this is further defined to be one of the rows in the group containing the applicable min/max.

I believe this is documented along with the min/max special case.

Note that this used to be the common behaviour of ALL implementations of SQL/RDBMS systems before the great unleashing (also known as eternal September).

(6) By zhangyushao (zhangysh1995) on 2020-08-06 08:45:46 in reply to 5 [link] [source]

Could you point me to the specified documentation page? I tried to search for it but I only found this

(8) By Keith Medcalf (kmedcalf) on 2020-08-06 11:00:06 in reply to 6 [link] [source]

https://www.sqlite.org/lang_select.html

On the page that documents the SELECT statement. There is a whole section about it:

Side note: Bare columns in an aggregate queries. The usual case is that all column names in an aggregate query are either arguments to aggregate functions or else appear in the GROUP BY clause. A result column which contains a column name that is not within an aggregate function and that does not appear in the GROUP BY clause (if one exists) is called a "bare" column. Example:

SELECT a, b, sum(c) FROM tab1 GROUP BY a;

In the query above, the "a" column is part of the GROUP BY clause and so each row of the output contains one of the distinct values for "a". The "c" column is contained within the sum() aggregate function and so that output column is the sum of all "c" values in rows that have the same value for "a". But what is the result of the bare column "b"? The answer is that the "b" result will be the value for "b" in one of the input rows that form the aggregate. The problem is that you usually do not know which input row is used to compute "b", and so in many cases the value for "b" is undefined.

Special processing occurs when the aggregate function is either min() or max(). Example:

SELECT a, b, max(c) FROM tab1 GROUP BY a;

When the min() or max() aggregate functions are used in an aggregate query, all bare columns in the result set take values from the input row which also contains the minimum or maximum. So in the query above, the value of the "b" column in the output will be the value of the "b" column in the input row that has the largest "c" value. There is still an ambiguity if two or more of the input rows have the same minimum or maximum value or if the query contains more than one min() and/or max() aggregate function. Only the built-in min() and max() functions work this way.

(9) By zhangyushao (zhangysh1995) on 2020-08-06 11:20:30 in reply to 8 [link] [source]

Thank you for the information.

It seems the result row is undefined.

(10) By Keith Medcalf (kmedcalf) on 2020-08-06 11:49:35 in reply to 9 [link] [source]

That depends on your definition of "undefined". The result of such bare columns will be taken from a single row in the group satisfying the aggregate.

How that particular single row is chosen if there are multiple to choose from appears "undefined" to an uneducated consumer. (Hence the reference to eternal September)

This is the same as saying any sufficiently advanced technology appears to be magic to those who do not understand the technology.

However there is most certainly a result and it most certainly is the result of the working of a deterministic algorithm which chooses the particular row, so it is far from "undefined".

In fact, the result is in fact guaranteed.

(11) By zhangyushao (zhangysh1995) on 2020-08-06 11:56:00 in reply to 10 [link] [source]

I believe I understand the documentation. So I said undefined because it uses this word.