meaning of "GROUP BY CONSTANT"

(1.1) By Wang Ke (krking) on 2021-04-09 17:15:54 edited from 1.0 [link] [source]

Hello everyone,

Consider the following example:

CREATE TABLE t0 (c0 TEXT NOT NULL UNIQUE, c1 TEXT);
INSERT INTO t0 VALUES ('1', NULL), ( x'0A'||'2', NULL);
SELECT c0 FROM t0 GROUP BY 0.5;

What does the "GROUP BY 0.5" mean? And what's the expected result of the example?

Looking forward to your explanation!

(2) By Keith Medcalf (kmedcalf) on 2021-04-09 18:03:52 in reply to 1.1 [link] [source]

GROUP BY 0.5 means to group all the candidate records into a group by the constant provided. Since the expression is constant, all candidates are grouped into a single group.

This should be contrasted to the case where the expression is a "plain integer" where it represents the ordinal position of the projection expression by which to group.

Note that using any constant that is not an integer has the same effect (eg GROUP BY 'yeehaw').

The output is exactly what would be expected -- the input rows are grouped into a single group and one row of the group (chosen at random) is used to provide values for all bare columns in the projection.

(3.1) By Wang Ke (krking) on 2021-05-29 06:56:24 edited from 3.0 in reply to 2 [link] [source]

Deleted

(4) By Keith Medcalf (kmedcalf) on 2021-04-10 01:16:40 in reply to 3.0 [link] [source]

The output is not random.

It is dependent on a myriad of factors and provided those factors remain constant, the output will remain constant. While you can enumerate the respective factors and determine what the particular output will be given the specific state of those factors, unless you are willing to do this, for the lay observer, the output is inscrutable and best described as "random" in order to prevent having to make a detailed explanation of the relevant factors that is likely will not be understood by the questioner and is also subject to change without warning.

(5.2) By Wang Ke (krking) on 2021-05-29 06:55:56 edited from 5.1 in reply to 4 [source]

Deleted

(6) By Keith Medcalf (kmedcalf) on 2021-04-10 05:42:15 in reply to 5.0 [link] [source]

Is the output produced a valid answer the the question asked?

If the answer is yes then there is no problem is there?

That the answer you get is correct is guaranteed and violation of this is a bug.

If there are multiple correct answers to the question you posed then you may get back any one of them, and which one in particular is undefined. The only way to ensure that you only get one specific correct answer is to make sure that the question is framed such that there can only be that one correct answer.

(7) By Wang Ke (krking) on 2021-04-10 06:08:18 in reply to 6 [link] [source]

Okay, thank you for your time.

(8) By L Carl (lcarlp) on 2021-04-10 14:23:31 in reply to 2 [link] [source]

Whoa! I can specify a plain integer in a GROUP BY to avoid having to retype the original expression? I knew I could do this for ORDER BY, but this is news to me. Where is this documented? I did not see it in the GROUP BY Documentation. I suspect this is not "standard" SQL.

(9) By Keith Medcalf (kmedcalf) on 2021-04-10 20:25:38 in reply to 8 [link] [source]

Yes, you can do this. I do not think it is documented (though I may be wrong). Nor do I think it is standard SQL (though standard SQL is a rapidly moving target).

It works the same as the order by clause except, of course, that the referenced projection expression must not contain an aggregate (you will get an error message if you do that).

(10) By Gunter Hick (gunter_hick) on 2021-04-11 12:13:22 in reply to 8 [link] [source]

As seen on stackoverflow, mysql does that too.

https://stackoverflow.com/questions/7392730/what-does-sql-clause-group-by-1-mean

consider:

select id,1+random() from test order by random()+1; select id,random() from test order by +random();

Both of these call random() twice for each row, delivering an "unexpected" order of results.

Whereas

select id,random() from test order by random(); select id,random() from test order by 2;

both deliver the expected order

(11) By L Carl (lcarlp) on 2021-04-11 14:24:48 in reply to 10 [link] [source]

I always knew you could specify positions in ORDER BY, but I did not know about GROUP BY.

This is very useful for exploring data. Thanks. I often have long complicated expressions for GROUP BY.

I vaguely recall there is a reason why this is not implemented in, say, Oracle, for example, that there is some ambiguity that arises, but thinking about it now I cannot see any problem with it.

By experiment, it appears I can also use a column alias in the GROUP BY that is defined in the SELECT. Again, I vaguely recall there is a reason why this is not implemented in many SQL engines. In this case, I am pretty sure ambiguities can arise. It's a question of scope. For example, I can swap the names of two columns using something like:

SELECT A AS B, B AS A FROM .....;

In this case, does GROUP BY A refer to the original column name or the alias? In the case of ORDER BY, I think the alias has precedence.

Anyway, the position notation should be documented in GROUP BY. Should I try to write an edit for the documentation?

(12) By Larry Brasfield (larrybr) on 2021-04-11 20:12:17 in reply to 11 [link] [source]

Anyway, the position notation should be documented in GROUP BY. Should I try to write an edit for the documentation?

Yes, it should.

A concise extra sentence or clause which succinctly documents the "column selected by ordinal" behavior would be quite welcome. However, please do not expect to necessarily see a verbatim appearance in the revised doc(s). The more detailed explication given for ORDER BY terms could be referenced to good effect.