group_concat - ORDER BY parameter please.
(1) By midijohnny on 2021-11-26 19:35:53 [link] [source]
The docs for the group_concat aggregate function states the following:
"The order of the concatenated elements is arbitrary."
And the test below at least confirms that any ORDER BY clause is indeed ignored (which I assume is meant by 'arbitrary' here).
sqlite> with t(x) AS ( values(3),(1),(2) ) select group_concat(x) from t ORDER BY x; 3,1,2
Would it be possible for this function to be enhanced in some future version so to either make use of any ORDER BY clause or perhaps have an explicit ordering expression as a parameter?
with t(x) AS ( values(3),(1),(2) ) select group_concat(x,',', ORDER BY x) from t;
I note some examples in the docs also appear to make the assumption that the 'group_concat' will output in a 'sane' order - see the CTE examples , which appear to make this assumption. (Either that: or the author happens to know that the ordering is implicitly safe in these examples).
(2) By midijohnny on 2021-11-26 20:12:17 in reply to 1 [link] [source]
Just to add that the following - using a window function - might be a suitable workaround; it works empirically for this case, but still not sure if this can be relied on.
-- SQLite 3.36.0 WITH data(x) AS ( values(3),(1),(2),(9),(5),(4),(6),(8),(7),(0) ), sorted(x) AS ( SELECT ROW_NUMBER() OVER (ORDER BY x) AS r FROM data ) SELECT group_concat(x) AS out FROM data UNION SELECT group_concat(x) AS out FROM sorted;
(3) By Keith Medcalf (kmedcalf) on 2021-11-26 20:25:36 in reply to 1 [source]
group_concat concatenates the arguments in the order of presentation. The description of this order as "arbitrary" is so because, to an uneducated observer, it appears to operate by "magic". (Remeber that any sufficiently advanced technology appears to the addle as "magic").
However, this is not actually true. The "order" is entirely deterministic and is not "arbitrary" or "magical".
Because group_concat (and every other aggregate/window function operates on the data passed to it "in order of presentement", then if you wish to have the aggregate calculated in a specific order, you have to control the order of presentement.
with t(x) as (values (3), (1), (2)) select group_concat(x, ', ') from ( select x from t order by x ) ;
(4) By Keith Medcalf (kmedcalf) on 2021-11-26 20:32:20 in reply to 3 [link] [source]
You will also note that
ORDER BY applies sortation for OUTPUT. This is because, although data processing is based on relational set algebra, current technology is restricted to serial (one after each) processing and presentement.
Current technology cannot compute the entire set operation at once, but only piecewise (one bit more after each previously). This is the ONLY reason there is any "ordering" whatsoever even though none exists.
(5) By midijohnny on 2021-11-26 21:34:37 in reply to 4 [link] [source]
Ok - thankyou - I think I get this. I am going to use the following rule of thumb here going forward:
If the thing that generates the (ordered) rows can be run in isolation - such that the ORDER BY presents the rows to the end-user in the required order - then if that is 'piped-into' anything else that uses it - its ordering is not going to change.
As opposed to something like a correlated sub-select which sees an 'outer' scope - and therefore cannot be run in isolation.
Still - it would be nice if there was an explicit parameter to 'group_concat' I think. (So others reading code would see the intention/assumption more clearly).
(6) By midijohnny on 2021-11-26 21:41:02 in reply to 3 [link] [source]
Looking at this again - I like this idiom you gave:
select group_concat(x, ', ') from ( select x from t order by x )
That is: the
can only be 'reached' by this one statement. (That is: there is absolutely no chance for anybody else to mess with this - as opposed my initial approach - somebody else could 're-use' the 'sorted' results and re-order them - even if that somebody else was myself).
(7) By anonymous on 2021-11-27 20:02:10 in reply to 1 [link] [source]
The usual work around is to move the projection to a sub query with the ORDER BY and do the aggregation and
group_concat in the outer query.
(8) By anonymous on 2021-11-28 21:35:51 in reply to 7 [link] [source]
While it is something that has been done, I don't like that.
I think that allowing
ORDER BY in non-window aggregate functions (probably using the same syntax of PostgreSQL) would be a good idea.
(If it is implemented, then it should also allow
NTH_VALUE to be used as non-window aggregate functions, since those would be meaningful if non-window aggregated
ORDER BY is allowed in this way.)
(9) By Gunter Hick (gunter_hick) on 2021-11-29 07:06:37 in reply to 1 [link] [source]
See also https://sqlite.org/quirks.html#aggregate_queries_can_contain_non_aggregate_result_columns_that_are_not_in_the_group_by_clause
You are grouping by the empty set, so the result is exactly 1 row, ordered by the arbitrarily* selected value of non-aggregate column x.
It is only if you have exactly one aggregate function, and that function is either min() or max(), that any non-aggregate columns are guaranteed to come from (any one of) the row(s) that contain the max/min value within their group.
*arbitrarily: depending on the visitation order (which may change if the query plan changes) and purposefully left unspecified, to preclude inferring any guarantee of stability