Bug or feature? DISTINCT ignored in the initial select of a WITH clause
(1.1) By Dario from Italy (dario_italy) on 2020-09-16 21:55:22 edited from 1.0 [source]
Hello.
I am working with sqlite 3.33.0 and I have observed the same behavior with all previous releases I’ve tested.
Consider this table:
CREATE TABLE t (label TEXT, step INTEGER); INSERT INTO T VALUES('a', 1); INSERT INTO T VALUES('a', 1); INSERT INTO T VALUES('b', 1);
I want each distinct row to produce copies with increasing step
values, like this:
a|1 b|1 a|2 b|2 a|3 b|3
so I’ve tried the following:
WITH RECURSIVE cte(label, step) AS ( SELECT DISTINCT * FROM t UNION ALL SELECT label, step + 1 FROM cte WHERE step < 3 ) SELECT * FROM cte;but I’ve got duplicated rows:
a|1 a|1 b|1 a|2 a|2 b|2 a|3 a|3 b|3
In short, the DISTINCT keyword is ignored in this context.
Of course, it is easy to get the intended result, either using the less efficient UNION
instead of UNION ALL
to weed out the duplicates after they have been produced, or defining a view, which works:
CREATE VIEW t_distinct AS SELECT DISTINCT * FROM t; WITH RECURSIVE cte(label, step) AS ( SELECT * FROM t_distinct UNION ALL SELECT label, step + 1 FROM cte WHERE step < 3 ) SELECT * FROM cte;a|1 b|1 a|2 b|2 a|3 b|3
Even
WITH RECURSIVE CTE(label, step) AS ( SELECT * FROM (SELECT DISTINCT * FROM t) UNION ALL SELECT label, step + 1 FROM cte WHERE step < 3 ) SELECT * FROM cte;works, so the unexpected behavior is not really a problem. However, I‘ve run the above code on PostgreSQL and I’ve got the intended results, and, as I understand the SQLite docs, SQLite shouldn’t ignore the
DISTINCT
keyword either.
A minimal intervention would be to write in the docs that SELECT DISTINCT
is not supported as the initial SELECT
of a WITH RECURSIVE
query, and add this to the list of differences with other RDBMS and unimplemented parts of the standard.
Another option would be to raise an error to keep the unsuspecting user from running code which would yield non-standard results.
However, the best option would be to implement SELECT DISTINCT
in this context too.
What do you think?
D.
(2) By Keith Medcalf (kmedcalf) on 2020-09-16 22:48:42 in reply to 1.1 [link] [source]
Fascinating. The current trunk also displays this behaviour.
Note that if you use the following it works correctly (which is basically what the DISTINCT keyword should do):
WITH RECURSIVE cte(label, step) AS (
SELECT * FROM t GROUP BY 1, 2
UNION ALL
SELECT label, step + 1 FROM cte WHERE step < 3
)
SELECT * FROM cte;
I should think this should either be fixed or documented.
(3) By Richard Hipp (drh) on 2020-09-17 00:01:25 in reply to 1.1 [link] [source]
See ticket c51489c3b8f919c5
(4) By Dario from Italy (dario_italy) on 2020-09-17 02:47:19 in reply to 3 [link] [source]
Wow!
Already fixed!
Thank you very much for your attention. Keep up the excellent work!
D.