SQLite User Forum

Subquery / Aggregate function in Returning Clause
Login

Subquery / Aggregate function in Returning Clause

(1) By anonymous on 2024-04-21 05:56:16 [link] [source]

In the documentation for the RETURNING clause found at:

https://sqlite.org/lang_returning.html

under "3. Limitations And Caveats" item 6 it reads:

...If there are subqueries in the RETURNING clause, those subqueries may contain aggregates and window functions, but aggregates cannot occur at the top level.

Would you please show me any example of a subquery in a returning clause that has an aggregate function?

My specific reason for asking this question is this example:

drop table if exists test;
create table test( id integer, value tests);
insert into test values (1,'a'),(2,'b'),(3,'c'),(4,'d'),(5,'e');
select * from test;
begin;
delete from test where id between 2 and 4
returning (select json_group_array(id));
rollback;
All I can get so far is:

(select json_group_array(id))
-----------------------------
[2]                          
[3]                          
[4]      

Is it possible to get one row [2,3,4]?

Thank you.

(2.1) By Mark Lawrence (mark) on 2024-04-21 14:15:13 edited from 2.0 in reply to 1 [link] [source]

Is it possible to get one row?

I would suspect not. Between the documentation quoted and the output you see, it would appear that RETURNING forces a̶ ̶s̶c̶a̶l̶a̶r̶ one result per row affected. I.e. grouping of the (top-level) result is not allowed.

The following might make it more obvious about how/where aggregate queries are allowed:

DROP TABLE IF EXISTS test;

CREATE TABLE test( id INTEGER, value tests);

INSERT INTO test
VALUES
    ( 1, 'a'),
    ( 2, 'b'),
    ( 3, 'c'),
    ( 4, 'd'),
    ( 5, 'e');

SELECT * FROM test;

BEGIN;

DELETE FROM test
WHERE id BETWEEN 2 AND 4
RETURNING id, (
    SELECT json_group_array(id) -- performed for
    FROM test                   -- each RETURN result!
  ) as result;

ROLLBACK;

-- ┌────┬───────────┐
-- │ id │  result   │
-- ├────┼───────────┤
-- │ 2  │ [1,3,4,5] │
-- │ 3  │ [1,3,4,5] │
-- │ 4  │ [1,3,4,5] │
-- └────┴───────────┘

(3) By Mark Lawrence (mark) on 2024-04-21 13:58:08 in reply to 2.0 [link] [source]

I tried putting the DELETE inside a CTE:

WITH del(id) AS (
    DELETE FROM test
    WHERE id BETWEEN 2 AND 4
    RETURNING id
)
SELECT json_group_array(id)
FROM del;

But unfortunately this errors out, and it appears the syntax is not supported by WITH. That would perhaps be a nice improvement for the developers to consider.

(4) By anonymous on 2024-04-21 16:51:50 in reply to 2.1 [link] [source]

Thanks. I understand what you mean about performed for each row, in general, since the result is a table; but for the specific example in you return, if it is run for each row, why is the result column the same for each row?

Why not as below? Why remove only the 2?

-- ┌────┬───────────┐
-- │ id │  result   │
-- ├────┼───────────┤
-- │ 2  │ [1,3,4,5] │
-- │ 3  │ [1,4,5]   │
-- │ 4  │ [1,5]     │
-- └────┴───────────┘

(5) By anonymous on 2024-04-21 17:00:26 in reply to 3 [link] [source]

Thanks. It seems that it was considered not too long ago. https://sqlite.org/forum/forumpost/0beaaf6eedaa535d2b4f4f832608020a7341da801172b6c3d1cd12270f27b67d.

It's no big deal for my current use; I just thought that I might've been overlooking something obvious.

(6) By anonymous on 2024-04-21 17:25:02 in reply to 3 [link] [source]

Or if replace the query with:

with
del_ids as 
(
 select
    id
 from
    test
 where
    id between 2 and 4
)
delete from test where id in (select id from del_ids)
returning
    id, 
   (select min(id) from del_ids) as min,
   (select max(id) from del_ids) as max;

The result is

┌────┬─────┬─────┐
│ id │ min │ max │
├────┼─────┼─────┤
│ 2  │ 3   │ 4   │
│ 3  │ 3   │ 4   │
│ 4  │ 3   │ 4   │
└────┴─────┴─────┘

What happened to the 2?

(7) By Mark Lawrence (mark) on 2024-04-22 08:11:34 in reply to 4 [link] [source]

Your question implies that you are thinking of the query execution as a set of procedural steps: I.e. that each returned calculation is made immediately after each row is deleted.

However, SQL is better understood as working with sets of data. The DELETE (or INSERT or UPDATE) is a self-contained operation, that produces a (atomic?) result, which is visible only in its entierety during the RETURNING phase.

(8) By Mark Lawrence (mark) on 2024-04-22 08:32:29 in reply to 6 [source]

Good question. I've expanded your example to illustrate the difference under DELETE versus INSERT conditions:

DROP TABLE IF EXISTS test;
CREATE TABLE test( id INTEGER, value tests);
INSERT INTO test
VALUES
    ( 1, 'a'),
    ( 2, 'b'),
    ( 3, 'c'),
    ( 4, 'd'),
    ( 5, 'e');

BEGIN;

WITH del_ids AS (
    SELECT id
    FROM test
    WHERE id BETWEEN 2
        AND 4
  )
INSERT INTO test
SELECT id, 'x'
FROM del_ids
RETURNING id,
    (
        SELECT min(id)
        FROM del_ids
      ) AS min,
    (
        SELECT max(id)
        FROM del_ids
      ) AS max;
-- ┌───────┬─────┬─────┐
-- │ rowid │ min │ max │
-- ├───────┼─────┼─────┤
-- │ 2     │ 2   │ 4   │
-- │ 3     │ 2   │ 4   │
-- │ 4     │ 2   │ 4   │
-- └───────┴─────┴─────┘

SELECT * FROM test;
-- ┌────┬───────┐
-- │ id │ value │
-- ├────┼───────┤
-- │ 1  │ a     │
-- │ 2  │ b     │
-- │ 3  │ c     │
-- │ 4  │ d     │
-- │ 5  │ e     │
-- │ 2  │ x     │
-- │ 3  │ x     │
-- │ 4  │ x     │
-- └────┴───────┘

ROLLBACK;

WITH del_ids AS (
    SELECT id
    FROM test
    WHERE id BETWEEN 2
        AND 4
  )
DELETE FROM test
WHERE id IN (
    SELECT id
    FROM del_ids
  )
RETURNING
    id,
    (
        SELECT min(id)
        FROM del_ids
      ) AS min,
    (
        SELECT max(id)
        FROM del_ids
      ) AS max;
-- ┌────┬─────┬─────┐
-- │ id │ min │ max │
-- ├────┼─────┼─────┤
-- │ 2  │ 3   │ 4   │
-- │ 3  │ 3   │ 4   │
-- │ 4  │ 3   │ 4   │
-- └────┴─────┴─────┘

SELECT * FROM test;
-- ┌────┬───────┐
-- │ id │ value │
-- ├────┼───────┤
-- │ 1  │ a     │
-- │ 5  │ e     │
-- └────┴───────┘

To my mind the DELETE result is unexpected (i.e. wrong). Do the developers or anyone else have an explaination?

(9) By Mark Lawrence (mark) on 2024-04-24 06:20:01 in reply to 8 [link] [source]

Hello devs, I suspect this has flown under the radar. Can you comment on the observed discrepancy described above? Or should I open a new issue?

(10) By Mark Lawrence (mark) on 2024-04-24 09:19:59 in reply to 8 [link] [source]

I might have found a partial explaination. It is probably not related to INSERT or DELETE, but the materialization (or not) of the CTE.

DROP TABLE IF EXISTS test;
CREATE TABLE test( id INTEGER, value tests);
INSERT INTO test
VALUES
    ( 1, 'a'),
    ( 2, 'b'),
    ( 3, 'c'),
    ( 4, 'd'),
    ( 5, 'e');

BEGIN;

    WITH del_ids AS MATERIALIZED (
        SELECT id FROM test
        WHERE id BETWEEN 2 AND 4
      )
    DELETE FROM test
    WHERE id IN ( SELECT id FROM del_ids )
    RETURNING
        id,
        ( SELECT min(id) FROM del_ids ) AS min_materialized,
        ( SELECT max(id) FROM del_ids ) AS max_materialized;

    -- ┌────┬──────────────────┬──────────────────┐
    -- │ id │ min_materialized │ max_materialized │
    -- ├────┼──────────────────┼──────────────────┤
    -- │ 2  │ 2                │ 4                │
    -- │ 3  │ 2                │ 4                │
    -- │ 4  │ 2                │ 4                │
    -- └────┴──────────────────┴──────────────────┘

ROLLBACK;
BEGIN;

    WITH del_ids AS NOT MATERIALIZED (
        SELECT id FROM test
        WHERE id BETWEEN 2 AND 4
      )
    DELETE FROM test
    WHERE id IN ( SELECT id FROM del_ids )
    RETURNING
        id,
        ( SELECT min(id) FROM del_ids ) AS min_not_materialized,
        ( SELECT max(id) FROM del_ids ) AS max_not_materialized;

    -- ┌────┬──────────────────────┬──────────────────────┐
    -- │ id │ min_not_materialized │ max_not_materialized │
    -- ├────┼──────────────────────┼──────────────────────┤
    -- │ 2  │ 3                    │ 4                    │
    -- │ 3  │ 3                    │ 4                    │
    -- │ 4  │ 3                    │ 4                    │
    -- └────┴──────────────────────┴──────────────────────┘

ROLLBACK;
BEGIN;

    WITH del_ids AS MATERIALIZED (
        SELECT id
        FROM test
      )
    INSERT INTO test
    SELECT id+10, 'x'
    FROM del_ids
    RETURNING id,
        ( SELECT min(id) FROM del_ids) AS min_materialized,
        ( SELECT max(id) FROM del_ids) AS max_materialized;

    -- ┌────┬──────────────────┬──────────────────┐
    -- │ id │ min_materialized │ max_materialized │
    -- ├────┼──────────────────┼──────────────────┤
    -- │ 11 │ 1                │ 5                │
    -- │ 12 │ 1                │ 5                │
    -- │ 13 │ 1                │ 5                │
    -- │ 14 │ 1                │ 5                │
    -- │ 15 │ 1                │ 5                │
    -- └────┴──────────────────┴──────────────────┘

ROLLBACK;
BEGIN;

    WITH del_ids AS NOT MATERIALIZED (
        SELECT id
        FROM test
      )
    INSERT INTO test
    SELECT id+10, 'x'
    FROM del_ids
    RETURNING id,
        ( SELECT min(id) FROM del_ids) AS min_not_materialized,
        ( SELECT max(id) FROM del_ids) AS max_not_materialized;

    -- ┌────┬──────────────────────┬──────────────────────┐
    -- │ id │ min_not_materialized │ max_not_materialized │
    -- ├────┼──────────────────────┼──────────────────────┤
    -- │ 11 │ 1                    │ 11                   │
    -- │ 12 │ 1                    │ 11                   │
    -- │ 13 │ 1                    │ 11                   │
    -- │ 14 │ 1                    │ 11                   │
    -- │ 15 │ 1                    │ 11                   │
    -- └────┴──────────────────────┴──────────────────────┘

ROLLBACK;

The timing of the return query execution can clearly affect the results. Although materialization is described as an optimisation, I'm not sure that the differences in (non-)materialized queries are unexpected. It would be nice if the outcomes were described better and guaranteed.

However, I still don't understand the DELETE/NOT_MATERIALIZED results. A MIN value of 3 does not conform to a pre- or post- query execution model, where I would expect 2 or NULL. And the MAX could also be expected to be NULL.

(11) By Mark Lawrence (mark) on 2024-04-24 09:37:47 in reply to 10 [link] [source]

What would PostgreSQL do? According to a test I ran, the results are consistent regardless of the materialization: the CTE tables are always in a "pre-action" state

(12) By anonymous on 2024-04-24 16:34:08 in reply to 9 [link] [source]

It appears to simple an item to have been wrong from the start.

I'm too much a novice to know how to compile older versions of SQLite and find where something changed without ruining my system. I just use the latest and let my Linux package handle updates automatically.

If you're able to locate the version in which it worked last and put (bug?) in the thread title, someone will likely respond.

I tried looking for an on-line fiddle (like the one you linked to in your later example) but they all use a version prior to 3.35.0 (2021-03-12) which precede the returning clause.

(13) By Stephan Beal (stephan) on 2024-04-24 21:45:21 in reply to 9 [link] [source]

Can you comment on the observed discrepancy described above?

This was resolved today in src:9ea6bcc8 :).

(14) By anonymous on 2024-04-25 01:37:04 in reply to 13 [link] [source]

Thank you.