Bug or feature? DISTINCT ignored in the initial select of a WITH clause

(1.1) By Dario from Italy (dario_italy) on 2020-09-16 21:55:22 edited from 1.0 [source]

Hello.

I am working with sqlite 3.33.0 and I have observed the same behavior with all previous releases I’ve tested.

Consider this table:

    CREATE TABLE t (label TEXT, step INTEGER);
    INSERT INTO T VALUES('a', 1);
    INSERT INTO T VALUES('a', 1);
    INSERT INTO T VALUES('b', 1);

I want each distinct row to produce copies with increasing step values, like this:

    a|1
    b|1
    a|2
    b|2
    a|3
    b|3

so I’ve tried the following:

    WITH RECURSIVE cte(label, step) AS (
        SELECT DISTINCT * FROM t 
      UNION ALL 
        SELECT label, step + 1 FROM cte WHERE step < 3
    ) SELECT * FROM cte;

but I’ve got duplicated rows:

    a|1
    a|1
    b|1
    a|2
    a|2
    b|2
    a|3
    a|3
    b|3

In short, the DISTINCT keyword is ignored in this context.

Of course, it is easy to get the intended result, either using the less efficient UNION instead of UNION ALL to weed out the duplicates after they have been produced, or defining a view, which works:

    CREATE VIEW t_distinct AS SELECT DISTINCT * FROM t;
    WITH RECURSIVE cte(label, step) AS (
        SELECT * FROM t_distinct
      UNION ALL
        SELECT label, step + 1 FROM cte WHERE step < 3
    ) SELECT * FROM cte;

a|1
b|1
a|2
b|2
a|3
b|3

Even

    WITH RECURSIVE CTE(label, step) AS (
        SELECT * FROM (SELECT DISTINCT * FROM t)
      UNION ALL
        SELECT label, step + 1 FROM cte WHERE step < 3
    ) SELECT * FROM cte;

works, so the unexpected behavior is not really a problem. However, I‘ve run the above code on PostgreSQL and I’ve got the intended results, and, as I understand the SQLite docs, SQLite shouldn’t ignore the DISTINCT keyword either.

A minimal intervention would be to write in the docs that SELECT DISTINCT is not supported as the initial SELECT of a WITH RECURSIVE query, and add this to the list of differences with other RDBMS and unimplemented parts of the standard.

Another option would be to raise an error to keep the unsuspecting user from running code which would yield non-standard results.

However, the best option would be to implement SELECT DISTINCT in this context too.

What do you think?

(2) By Keith Medcalf (kmedcalf) on 2020-09-16 22:48:42 in reply to 1.1 [link] [source]

Fascinating. The current trunk also displays this behaviour.

Note that if you use the following it works correctly (which is basically what the DISTINCT keyword should do):

WITH RECURSIVE cte(label, step) AS (
   SELECT * FROM t GROUP BY 1, 2
 UNION ALL 
   SELECT label, step + 1 FROM cte WHERE step < 3
)
SELECT * FROM cte;

I should think this should either be fixed or documented.

(3) By Richard Hipp (drh) on 2020-09-17 00:01:25 in reply to 1.1 [link] [source]

See ticket c51489c3b8f919c5

(4) By Dario from Italy (dario_italy) on 2020-09-17 02:47:19 in reply to 3 [link] [source]

Wow!

Already fixed!

Thank you very much for your attention. Keep up the excellent work!