How should RETURNING work with a cascading delete?

(1) By Richard Hipp (drh) on 2021-01-31 18:22:00 [link] [source]

Consider the following hypothetical SQL:

PRAGMA foreign_keys=on;
CREATE TABLE t1(
  a INTEGER PRIMARY KEY,
  b INTEGER REFERENCES t1 ON DELETE CASCADE
);
INSERT INTO t1(a,b) VALUES(0,null),(1,0),(2,1),(3,2),(4,null);
DELETE FROM t1 WHERE a=0 RETURNING a;

The final DELETE should delete rows 0, 1, 2, and 3 due to the ON DELETE CASCADE clause, and indeed it does.

Question:

Should the RETURNING clause return all four deleted rows, or should it only return the specified row 0?

Reasons to return all four rows:

If you have a RETURNING clause, presumably you want to see every row that has been deleted, regardless of whether or not that row was deleted directly or via a cascading delete.
If you don't return all rows, then the result of RETURNING can vary according to the order of the delete. If the final statement is "DELETE FROM t1 WHERE a IN (0,1,2,3) RETURNING a;" then you might get "0 1 2 3" or you might get "0" or you might get something in between depending on the delete order and the timing of the cascading delete.
Triggers and other side effects fire on cascading deletes. Why should RETURNING be any different?

Reasons to return just the one row 0:

The RETURNING clause cannot report on cascading deletes in other tables, since the argument to RETURNING must refer to columns in the primary table. So if cascading deletes cannot reported in all tables, why report them in the primary table? Shouldn't all tables be treated the same?
PostgreSQL 9.6 works this way.

(2) By Larry Brasfield (LarryBrasfield) on 2021-01-31 20:44:04 in reply to 1 [source]

(Speaking for myself only:)

The cascading delete case you present is an outlier. In the usual case, cascaded deletions will happen in an arbitrary set of tables, which only by coincidence includes the origination of the cascade. I see no reason that it should be made easier to get the cascaded results only for that one somewhat special case when so many other cascaded results are unavailable that way. Most foreign keys reference other tables. (I suspect that is why they are named "FOREIGN".) Of course, this is just reiterating your point, "Shouldn't all tables be treated the same?" Yes, it is best that they are, to ease human conceptualization.

The consequence, "you might get something in between depending on the delete order and the timing", gives me a shuddering dread. SQL is best considered as a way of specifying set operations. That "delete order" and "timing" dependence is a monkey wrench thrown into that eminently useful conceptualization.

(3) By Ryan Smith (cuz) on 2021-02-01 00:16:55 in reply to 1 [link] [source]

Definitely the latter.

The main reason a programmer needs RETURNING is that he/she can ascertain which rows he/she caused to be added/deleted/touched directly and not as a consequence of a FK or TRIGGER or other Schemata, that is for the DB Schema's concern, and if perchance a programmer IS interested in all effects, that is the complicated case and then more queries are warranted.

The latter case is how it works in most engines and is expected to work in the general case (I can't speak for ALL engines or ALL expectations of course).

PS: It's very exciting seeing the question asked. :)

(4) By anonymous on 2021-02-01 07:11:25 in reply to 1 [link] [source]

Prefer

If you have a RETURNING clause, presumably you want to see every row that has been deleted, regardless of whether or not that row was deleted directly or via a cascading delete.

This gives me the option to undo.

(5) By anonymous on 2021-02-01 07:48:47 in reply to 1 [link] [source]

Maybe:

PRAGMA returning_indirectly_touched_rows=yes/no

(6) By Holger J (holgerj) on 2021-02-01 12:14:30 in reply to 1 [link] [source]

Do it the same way PostgreSQL does.

(7) By Richard Hipp (drh) on 2021-02-01 12:52:15 in reply to 5 [link] [source]

I was thinking something like:

 ... RETURNING *; -- Postgres behavior

Versus:

 ... RETURNING RECURSIVE *;  -- Also reports cascading changes

In other words, add the RECURSIVE keyword after RETURNING to cause foreign-key updates to the same table to be reported, and use the traditional syntax without RECURSIVE to report only the rows that are directly modified and omit changed by foreign key constraints or auxiliary triggers.

This remains an option for the future. For now, at least, the behavior of SQLite has been shifted to align with Postgres. Such was the recommendation of the Postgres devs, in addition to the majority recommendation of the responses on this forum thread.

(8) By ddevienne on 2021-02-01 13:06:52 in reply to 1 [link] [source]

I've wished for a way to find out the effects of CASCADE many times,
but across the board, in any table, not just for self-referential FKs.

So I agree with others, that direct effets should be returned only,
and that same-table CASCASE side-effets are out-of-bounds,
for compatibility with PostgreSQL (and likely other RDBMSs).

Regarding your point 1.2, I guess that implies that the CASCADE is applied
depth-first per-deleted-row, rather than accumulated for post-processing
(kinda like a breath-first delete). That's unfortunate, and of course an
implementation detail I guess. Could be a nasty gotcha indeed.

Still, compatibility matters more IMHO.
The doc could just mention that pathological case perhaps.

BTW, option #1 would violate what changes returns, no?
So that clearly points to option#2 IMHO, for internal SQLite consistency.

Also, what would be allowed after RETURNING? Any select-clause expr?
Can I RETURNING COUNT(*) to emulate sqlite3_changes() for example?

(9) By ddevienne on 2021-02-01 13:09:17 in reply to 7 [link] [source]

An opt-in way to get the more deterministic but less compatible full side-effets (in the same table) sounds great, as a future extension. The best of both worlds.

(10) By anonymous on 2021-02-01 15:43:35 in reply to 8 [link] [source]

I've wished for a way to find out the effects of CASCADE many times, but across the board, in any table, not just for self-referential FKs.

Perhaps the session extension or the pre-update hook could be useful?