group_concat - ORDER BY parameter please.
(1) By midijohnny on 2021-11-26 19:35:53 [link] [source]
The docs for the group_concat aggregate function states the following:
"The order of the concatenated elements is arbitrary."
And the test below at least confirms that any ORDER BY clause is indeed ignored (which I assume is meant by 'arbitrary' here).
sqlite> with t(x) AS ( values(3),(1),(2) ) select group_concat(x) from t ORDER BY x;
3,1,2
Would it be possible for this function to be enhanced in some future version so to either make use of any ORDER BY clause or perhaps have an explicit ordering expression as a parameter?
Something like:
with t(x) AS ( values(3),(1),(2) )
select group_concat(x,',', ORDER BY x) from t;
I note some examples in the docs also appear to make the assumption that the 'group_concat' will output in a 'sane' order - see the CTE examples , which appear to make this assumption. (Either that: or the author happens to know that the ordering is implicitly safe in these examples).
(2) By midijohnny on 2021-11-26 20:12:17 in reply to 1 [link] [source]
Just to add that the following - using a window function - might be a suitable workaround; it works empirically for this case, but still not sure if this can be relied on.
-- SQLite 3.36.0
WITH
data(x) AS ( values(3),(1),(2),(9),(5),(4),(6),(8),(7),(0) ),
sorted(x) AS ( SELECT ROW_NUMBER() OVER (ORDER BY x) AS r FROM data )
SELECT group_concat(x) AS out FROM data
UNION
SELECT group_concat(x) AS out FROM sorted;
Gives:
out |
---|
1,2,3,4,5,6,7,8,9,10 |
3,1,2,9,5,4,6,8,7,0 |
Also: looks like what I'm asking for is something similar to the Oracle LISTAGG function, or the Postgres STRING_AGG function I guess.
(3) By Keith Medcalf (kmedcalf) on 2021-11-26 20:25:36 in reply to 1 [link] [source]
group_concat concatenates the arguments in the order of presentation. The description of this order as "arbitrary" is so because, to an uneducated observer, it appears to operate by "magic". (Remeber that any sufficiently advanced technology appears to the addle as "magic").
However, this is not actually true. The "order" is entirely deterministic and is not "arbitrary" or "magical".
Because group_concat (and every other aggregate/window function operates on the data passed to it "in order of presentement", then if you wish to have the aggregate calculated in a specific order, you have to control the order of presentement.
with t(x) as (values (3), (1), (2))
select group_concat(x, ', ')
from (
select x
from t
order by x
)
;
(4) By Keith Medcalf (kmedcalf) on 2021-11-26 20:32:20 in reply to 3 [link] [source]
You will also note that ORDER BY
applies sortation for OUTPUT. This is because, although data processing is based on relational set algebra, current technology is restricted to serial (one after each) processing and presentement.
Current technology cannot compute the entire set operation at once, but only piecewise (one bit more after each previously). This is the ONLY reason there is any "ordering" whatsoever even though none exists.
(5) By midijohnny on 2021-11-26 21:34:37 in reply to 4 [link] [source]
Ok - thankyou - I think I get this. I am going to use the following rule of thumb here going forward:
If the thing that generates the (ordered) rows can be run in isolation - such that the ORDER BY presents the rows to the end-user in the required order - then if that is 'piped-into' anything else that uses it - its ordering is not going to change.
As opposed to something like a correlated sub-select which sees an 'outer' scope - and therefore cannot be run in isolation.
Still - it would be nice if there was an explicit parameter to 'group_concat' I think. (So others reading code would see the intention/assumption more clearly).
(6) By midijohnny on 2021-11-26 21:41:02 in reply to 3 [source]
Looking at this again - I like this idiom you gave:
select group_concat(x, ', ')
from (
select x
from t
order by x
)
That is: the
from (...)
can only be 'reached' by this one statement.
(That is: there is absolutely no chance for anybody else to mess with this - as opposed my initial approach - somebody else could 're-use' the 'sorted' results and re-order them - even if that somebody else was myself).
(7) By anonymous on 2021-11-27 20:02:10 in reply to 1 [link] [source]
The usual work around is to move the projection to a sub query with the ORDER BY and do the aggregation and group_concat
in the outer query.
(8) By anonymous on 2021-11-28 21:35:51 in reply to 7 [link] [source]
While it is something that has been done, I don't like that.
I think that allowing ORDER BY
in non-window aggregate functions (probably using the same syntax of PostgreSQL) would be a good idea.
(If it is implemented, then it should also allow FIRST_VALUE
, LAST_VALUE
, and NTH_VALUE
to be used as non-window aggregate functions, since those would be meaningful if non-window aggregated ORDER BY
is allowed in this way.)
(9) By Gunter Hick (gunter_hick) on 2021-11-29 07:06:37 in reply to 1 [link] [source]
See also https://sqlite.org/quirks.html#aggregate_queries_can_contain_non_aggregate_result_columns_that_are_not_in_the_group_by_clause
You are grouping by the empty set, so the result is exactly 1 row, ordered by the arbitrarily* selected value of non-aggregate column x.
It is only if you have exactly one aggregate function, and that function is either min() or max(), that any non-aggregate columns are guaranteed to come from (any one of) the row(s) that contain the max/min value within their group.
*arbitrarily: depending on the visitation order (which may change if the query plan changes) and purposefully left unspecified, to preclude inferring any guarantee of stability
(10.1) Originally by anonymous with edits by Stephan Beal (stephan) on 2023-10-06 10:45:17 from 10.0 in reply to 3 [link] [source]
That is not necessarily true. If they put arbitrary , it means in later versions you may see it not maintaining order. Refer answer here: https://stackoverflow.com/a/57076660/4417449. A fool-proof way would be to do this: GROUP_CONCAT(list_order || ':' || value)
. Then wherever we need this, we can sort it using the list_order.
(edited by admin to unmangle markdown)
(11) By anonymous on 2023-10-05 20:47:58 in reply to 2 [link] [source]
Window function has one limitation (?) that prevents generic usage, i.e. with user functions: the API seem to require the implementation of the inverse
(remove a value from the aggregate that was previously added there). This works fine for sum
, but not for any complex sequential math functions, e.g. a hash value computation.