Possible bug in group_concat() ?
(1) By Mark Benningfield (mbenningfield1) on 2021-09-28 01:25:19
While testing my own table-valued function, I got some rather weird results when running a window-function test. Try as I might, I couldn't account for it in my own code. The debugger showed the correct output being assigned on each call to the `xColumn()` function. So, I ran the same test with the `generate_series()` function as a sanity check, using the following query: CREATE TABLE persons(id INTEGER PRIMARY KEY, name TEXT); INSERT INTO persons (name) VALUES('John'), ('Paul'), ('George'), ('Ringo'); SELECT group_concat(value,name) OVER ( ORDER BY name ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING ) AS result FROM persons, generate_series(4450,4455); Granted, the query is nonsense, but the issue is not that the results don't make any sense; it's that information is missing: result ------------------------- 4450George4451 4450George4451George4452 4451George4452George4453 4452George4453George4454 4453George4454George4455 4454George4455John4450 4455John4450John4451 50John4451John4452 <-- initial values begin to be truncated 51John4452John4453 52John4453John4454 53John4454John4455 54John4455Paul4450 55Paul4450Paul4451 50Paul4451Paul4452 51Paul4452Paul4453 52Paul4453Paul4454 53Paul4454Paul4455 54Paul4455Ringo4450 55Ringo4450Ringo4451 450Ringo4451Ringo4452 451Ringo4452Ringo4453 452Ringo4453Ringo4454 453Ringo4454Ringo4455 454Ringo4455 shell `.version` command: SQLite 3.36.0 2021-06-18 18:36:39 5c9a6c06871cb9fe42814af9c039eb6da5427a6ec28f187af7ebfb62eafa66e5 zlib version 1.2.11 gcc-5.2.0 I compiled `series.c` as a 32-bit dll using Visual Studio 2015 (_MSC_VER 1900) Regrettably, I'm not set up for testing sqlite itself, and besides, the amalgamation is apparently too large for the VS debugger (it can never seem to break on the correct source line).
(2.1) By Larry Brasfield (larrybr) on 2021-09-28 16:46:13 edited from 2.0 in reply to 1 [link]
As I read the code which implements group_concat(), I do not see why it does this. It does appear to be a bug, and it is somewhat alarming as well. (It makes me wonder what happens when the effect is more pronounced.) As you likely know, group_concat( whatever, separator ) is normally used with a fixed separator rather than one which can change with each member of the aggregate. This is why such strange behavior has escaped notice for so long. I will take a look at this running under a debugger. (Update via edit:) It is pretty clear why this happens. As the window moves, concatenated values are being removed from the front of the accumulation and the wrong separator (off by 1) values are "removed". I'm devising a fix that does not penalize the common, non-varying separator case while handling the weird, varying separator case.
(3) By Ryan Smith (cuz) on 2021-09-28 15:04:55 in reply to 2.0 [link]
Larry, running some tests on my side (but not with debugger, and not being overly familiar with the SQLite internals), it's real hard to find other ways to cause this, but I did come to believe the problem originates in the Window function return rather than the group_concat() code itself. This deduced by manner of poking the black box with a stick and seeing how it rolls, so I could be very wrong, but posted on the off chance it saves some bug-hunting time.
(4) By Larry Brasfield (larrybr) on 2021-09-29 01:12:11 in reply to 3 [link]
Thanks for the idea. As it turns out, the problem is in the group_concat() logic for removing row effects from the aggregation as a window slides. The misbehaving logic was to remove a concatenee and the separator with which it was associated in the call to group_concat(). Unfortunately, the first such call in an aggregation tosses the separator; the first utilized separator is that passed in the next group_concat() call. So, somehow, the removal has to remove the Nth passed concatenee and the N+1th passed separator. This requires some state which is not available to the function without it storing separator lengths embedded in the aggregated concatenation. Because that could be expensive, and is rarely needed (because separators usually do not vary), this storage is undertaken only when needed.
(5) By Mark Benningfield (mbenningfield1) on 2021-09-29 02:25:59 in reply to 4 [link]
Thanks for the fast fix. v3.37.0 is due out in a couple of weeks isn't it?
(6) By Larry Brasfield (larrybr) on 2021-09-29 02:51:25 in reply to 5 [link]
My recollection is that the v3.37.0 release is nearly a month away. Its date is not fixed either, and depends on testing results.
(7) By Larry Brasfield (larrybr) on 2021-09-29 15:38:12 in reply to 5 [link]
Mark, and any other interested reader: The group_concat() function is documented, for N effective calls, to concatenate N argument 1 values separated by N-1 argument 2 (or its default) values. It is completely undocumented which one of the passed-in argument 2 values is dropped. In the interest of code simplicity and performance, I am about to cause the Nth argument 2 value to be dropped instead of the 1st argument 2 value as happens with existing releases. I would be interested to see the best argument for retaining the current undocumented behavior, in particular some affected, reasonable use case(s). (No need to bother saying undocumented behavior should never change. We see that as having value which competes with other values.)
(8) By Mark Benningfield (mbenningfield1) on 2021-09-29 21:26:05 in reply to 7 [link]
Since the `group_concat()` function was converted to an aggregate window function, then it should inverse what was _actually_ accumulated. It looks like the new behavior does that. I'm not set up to build the amalgamation from source (if I need to build sqlite, I just build the amalgamation, because that's easy, and I'm lazy :) ). Maintaining the state necessary to be aware of the lengths of each added separator naturally leads to inversing the Nth separator instead of just the first one. However, with the "off-by-1" problem fixed, if the separators are the same length, like in the query I used, it shouldn't matter really with regard to the old behavior versus the new, because the result will be truncated by the correct amount in either case. I submit that, to the extent that the old behavior conflicts with the new, the old behavior is not acting correctly as a window function, in that it is not removing _exactly_ what was added.