Novice question concerning why a select query provides desired results but similar update query does not?
(1) By Gary (1codedebugger) on 2022-12-13 01:12:01 [link] [source]
I'm trying to update a table to correct an error in the data that was loaded into SQLite. This is an example of the issue.
A code was skipped in arranging the data to be loaded, such that, beginning at indexInter = 4, the column "code" is one row out-of-sync (behind) with the rest of the data in each row. If I run the example select query below, the results are what is needed; but, when that query is modified to an update, all rows are updated to the first matching row.
Woudl you please tell me what novice error I am making and fundamental principle I am overlooking? Thank you.
sqlite> .dump testUpdate
PRAGMA foreign_keys=OFF;
BEGIN TRANSACTION;
CREATE TABLE testUpdate (indexRow integer unique, indexInter integer, code text);
INSERT INTO testUpdate VALUES(130120390150,1,'H1961');
INSERT INTO testUpdate VALUES(130120390250,2,'H8033');
INSERT INTO testUpdate VALUES(130120390350,3,'H1732');
INSERT INTO testUpdate VALUES(130120390450,4,'H3117');
INSERT INTO testUpdate VALUES(130120390470,NULL,'punct3');
INSERT INTO testUpdate VALUES(130120390550,5,'H7969');
INSERT INTO testUpdate VALUES(130120390650,6,'H398');
INSERT INTO testUpdate VALUES(130120390750,7,'H8354');
INSERT INTO testUpdate VALUES(130120390770,NULL,'punct2');
INSERT INTO testUpdate VALUES(130120390850,8,'');
INSERT INTO testUpdate VALUES(130120390950,9,'H3559');
INSERT INTO testUpdate VALUES(130120391050,10,'');
INSERT INTO testUpdate VALUES(130120391150,11,'H251');
INSERT INTO testUpdate VALUES(130120391250,12,'');
COMMIT;
sqlite> with fixes as
(select indexInter, code
from testUpdate
where indexInter > 2)
select indexRow, indexInter, code,
(select code from fixes where fixes.indexInter+1 = testUpdate.indexInter) as edit
from testUpdate;
indexRow indexInter code edit
------------ ---------- ------ -----
130120390150 1 H1961
130120390250 2 H8033
130120390350 3 H1732
130120390450 4 H3117 H1732
130120390470 punct3
130120390550 5 H7969 H3117
130120390650 6 H398 H7969
130120390750 7 H8354 H398
130120390770 punct2
130120390850 8 H8354
130120390950 9 H3559
130120391050 10 H3559
130120391150 11 H251
130120391250 12 H251
sqlite> begin transaction;
sqlite> with fixes as
(select indexInter, code
from testUpdate
where indexInter > 2)
update testUpdate
set code = (select code
from fixes
where fixes.indexInter+1 = testUpdate.indexInter)
where indexInter > 3 returning *;
indexRow indexInter code
------------ ---------- -----
130120390450 4 H1732
130120390550 5 H1732
130120390650 6 H1732
130120390750 7 H1732
130120390850 8 H1732
130120390950 9 H1732
130120391050 10 H1732
130120391150 11 H1732
130120391250 12 H1732
sqlite> rollback;
(2) By Gary (1codedebugger) on 2022-12-13 01:19:00 in reply to 1 [link] [source]
Got it with this but why doesn't it work the first way?
sqlite> begin transaction;
sqlite> with fixes as
(select indexInter, code
from testUpdate where indexInter > 2)
update testUpdate
set code = fixes.code
from fixes
where fixes.indexInter+1 = testUpdate.indexInter
and testUpdate.indexInter > 3
returning *;
indexRow indexInter code
------------ ---------- -----
130120390450 4 H1732
130120390550 5 H3117
130120390650 6 H7969
130120390750 7 H398
130120390850 8 H8354
130120390950 9
130120391050 10 H3559
130120391150 11
130120391250 12 H251
sqlite> select * from testUpdate;
indexRow indexInter code
------------ ---------- ------
130120390150 1 H1961
130120390250 2 H8033
130120390350 3 H1732
130120390450 4 H1732
130120390470 punct3
130120390550 5 H3117
130120390650 6 H7969
130120390750 7 H398
130120390770 punct2
130120390850 8 H8354
130120390950 9
130120391050 10 H3559
130120391150 11
130120391250 12 H251
(3.1) By Keith Medcalf (kmedcalf) on 2022-12-13 02:32:20 edited from 3.0 in reply to 2 [link] [source]
Deleted(4) By Keith Medcalf (kmedcalf) on 2022-12-13 02:32:11 in reply to 3.0 [link] [source]
Scrap that. WHat version are you using? The current tip version works properly if the temp table is materialized.
sqlite> with fixes as materialized
...> (select indexInter, code
(x1...> from testUpdate
(x1...> where indexInter > 2)
...> update testUpdate
...> set code = (select code
(x1...> from fixes
(x1...> where fixes.indexInter+1 = testUpdate.indexInter)
...> where indexInter > 3 returning *;
QUERY PLAN
|--SCAN testUpdate (~983040 rows)
`--CORRELATED SCALAR SUBQUERY 2
|--MATERIALIZE fixes
| `--SCAN testUpdate (~983040 rows)
`--SCAN fixes (~245760 rows)
┌──────────────┬────────────┬─────────┐
│ indexRow │ indexInter │ code │
├──────────────┼────────────┼─────────┤
│ 130120390450 │ 4 │ 'H1732' │
│ 130120390550 │ 5 │ 'H3117' │
│ 130120390650 │ 6 │ 'H7969' │
│ 130120390750 │ 7 │ 'H398' │
│ 130120390850 │ 8 │ 'H8354' │
│ 130120390950 │ 9 │ '' │
│ 130120391050 │ 10 │ 'H3559' │
│ 130120391150 │ 11 │ '' │
│ 130120391250 │ 12 │ 'H251' │
└──────────────┴────────────┴─────────┘
But not if the CTE is not materialized.
sqlite> with fixes as
...> (select indexInter, code
(x1...> from testUpdate
(x1...> where indexInter > 2)
...> update testUpdate
...> set code = (select code
(x1...> from fixes
(x1...> where fixes.indexInter+1 = testUpdate.indexInter)
...> where indexInter > 3 returning *;
QUERY PLAN
|--SCAN testUpdate (~983040 rows)
`--CORRELATED SCALAR SUBQUERY 2
`--SCAN testUpdate (~262144 rows)
┌──────────────┬────────────┬─────────┐
│ indexRow │ indexInter │ code │
├──────────────┼────────────┼─────────┤
│ 130120390450 │ 4 │ 'H1732' │
│ 130120390550 │ 5 │ 'H1732' │
│ 130120390650 │ 6 │ 'H1732' │
│ 130120390750 │ 7 │ 'H1732' │
│ 130120390850 │ 8 │ 'H1732' │
│ 130120390950 │ 9 │ 'H1732' │
│ 130120391050 │ 10 │ 'H1732' │
│ 130120391150 │ 11 │ 'H1732' │
│ 130120391250 │ 12 │ 'H1732' │
└──────────────┴────────────┴─────────┘
contrast
sqlite> with fixes as
...> (select indexInter, code
(x1...> from testUpdate where indexInter > 2)
...> update testUpdate
...> set code = fixes.code
...> from fixes
...> where fixes.indexInter+1 = testUpdate.indexInter
...> and testUpdate.indexInter > 3 returning *;
QUERY PLAN
|--SCAN testUpdate (~983040 rows)
`--SEARCH testUpdate USING AUTOMATIC PARTIAL COVERING INDEX (indexInter=?) (~20 rows)
┌──────────────┬────────────┬─────────┐
│ indexRow │ indexInter │ code │
├──────────────┼────────────┼─────────┤
│ 130120390450 │ 4 │ 'H1732' │
│ 130120390550 │ 5 │ 'H3117' │
│ 130120390650 │ 6 │ 'H7969' │
│ 130120390750 │ 7 │ 'H398' │
│ 130120390850 │ 8 │ 'H8354' │
│ 130120390950 │ 9 │ '' │
│ 130120391050 │ 10 │ 'H3559' │
│ 130120391150 │ 11 │ '' │
│ 130120391250 │ 12 │ 'H251' │
└──────────────┴────────────┴─────────┘
You will note that when using UPDATE FROM the CTE is scanned AND THEN the updates are applied. Same when you materialize the CTE. However, when the CTE is not materialized, the update confounds itself.
(5.2) By Keith Medcalf (kmedcalf) on 2022-12-13 02:52:12 edited from 5.1 in reply to 4 [link] [source]
In other words, when the query is:
sqlite> with fixes as
...> (select indexInter, code
(x1...> from testUpdate
(x1...> where indexInter > 2)
...> update testUpdate
...> set code = (select code
(x1...> from fixes
(x1...> where fixes.indexInter+1 = testUpdate.indexInter)
...> where indexInter > 3 returning *;
what you are really doing is the following:
update testUpdate as o
set code = (select code
from testUpdate
where indexInter = o.indexInter - 1
and indexInter > 2
)
where indexInter > 3 returning *;
which, for each row in testUpdate where indexInter > 3 goes and looks up the value of code in testupdate where the indexInter is one less than the row being updated, and applies the update. Then go to the next row. Lather and repeat. THis means that the value of code is taken at each step from the previous indexInter (which was just updated with the value from the pervious indexInter (which was just updated from the row with the previous indexInter (...)))
This means that the single first replacement value is propagated to all updated records. WHat exactly happens depends, of course, on the available indexes, the traversal order chosen, the physical order of the records, etc.
In short, compute the updates AND THEN apply them. Do not do it all at once if the are all coming from the same table.
(6) By Gary (1codedebugger) on 2022-12-13 04:26:37 in reply to 5.2 [link] [source]
Thank you for the explanation. I think I understand what you're saying and how it fits the results generated; but I could not explain to anyone why the first update seems to ignore the CTE fixes in the subquery. Nonetheless, it's not important; the important item is that if I want to get correct results, I need to write the query in the proper manner. It seemed like a bad idea to have a subquery for every row needing an update but, at that point, I was more concerned with getting the data corrected than being proficient in SQL.
The version is:
sqlite> .version
SQLite 3.40.0 2022-11-16 12:10:08 89c459e766ea7e9165d0beeb124708b955a4950d0f4792f457465d71b158alt1
zlib version 1.2.13
gcc-12.2.0
Just out of curiosity, why does the query plan mention so many rows when there are 14 in the table?
QUERY PLAN
|--SCAN testUpdate (~983040 rows)
`--CORRELATED SCALAR SUBQUERY 2
`--SCAN testUpdate (~262144 rows)
If you don't mind another question: I was reading about what materialized means here at https://www.sqlite.org/lang_with.html under the heading "Materialization Hints" and it reads:
If the NOT MATERIALIZED phrase is used, then select-stmt is substituted as a subquery in place of every occurrence of the CTE table name. ... If neither hint is present, then SQLite 3.35.0 (2021-03-12) and later process the CTE as if ... the NOT MATERIALIZED phrase were present if the CTE is used only once.
Does this mean that
update testUpdate
set code = (select code
from fixes
where fixes.indexInter+1 = testUpdate.indexInter)
where indexInter > 3 returning *;
becomes
update testUpdate
set code = (select code
from (select indexInter, code
from testUpdate
where indexInter > 2) fixes
where fixes.indexInter+1 = testUpdate.indexInter)
where indexInter > 3 returning *;
If so, it would appear a bit confusing to determine which code and indexInter is to be used. I tried to run it as a select and it returns ambiguous column errors in multiple places. And, if I mess with it to corect that, it appears that the right side of "set code = " returns a list rather than a single value; and, perhaps, it picks the first row in the list, which here is H1732. But I'm speaking far beyond my skill and experience here.
(7) By Keith Medcalf (kmedcalf) on 2022-12-13 05:27:54 in reply to 6 [link] [source]
It seemed like a bad idea to have a subquery for every row needing an update
That is not a problem. Physics is the problem. THe update statement processes the rows selected by the UPDATE ... WHERE ... one after each that pass the WHERE conditions. When each "row" to be "updated" is found, the correlated subquery is run. This subquery runs against the very same table that is the target of the update. After the update is made (and is now visible to the update command itself) the update moves to the next row to be updated. It then goes and looks up the update value. By happenstance, the value it finds is the very one that it just updated. So it writes that value into the row being updated, and moves to the next row. THis process repeats, cascading the update from row to row (isolation only applies between connections, not on the same connection). Hence all the rows are updated to the same value.
Just out of curiosity, why does the query plan mention so many rows when there are 14 in the table?
ANALYZE has not been run so there are no statitics available and since no useful indexes are being used, the number of rows cannot be extimated with any accuracy. This means that default assumptions as to number of rows and selectivity is used.
If so, it would appear a bit confusing to determine which code and indexInter is to be used
Not at all. References in the projection refer preferentially to the tables being selected from. That is the subquery (select ... from testupdate ...) all references to bare columns (without a schema table referent) refer to the column in the table being selected from. THe only ambiguous name is the final indexInter wollowing the word WHERE because that name exists in BOTH referenced tables (the testUpdate being updated and the fixes table from the subquery projection).
However, the queries are not equivalent as you can tell from how they run and if you turn on eqp (.eqp on).
(9) By Gary (1codedebugger) on 2022-12-13 06:20:46 in reply to 7 [link] [source]
This subquery runs against the very same table that is the target of the update.
Thanks. I should've understood that the first time you explained it. In the first version, the CTE "fixes", since it is not materialized, is being run as a subquery for each row in the update query, and each run is against the newly updated version of the table being updated; and, therefore, it keeps "passing up(ward)" the same code column value from the indexInter row in the very first run of the fixes subquery. Now I understand.
May I ask two more question please? 1) Is UPDATE FROM preferrable to the CTE materialized with the set code = to a subquery? And, 2) unless one is aiming for one of those optimizations mentioned in the Materialization Hints in the documentation on WITH, is it best/safer to always use materialize in a CTE? Thanks.
(10) By Keith Medcalf (kmedcalf) on 2022-12-13 14:47:02 in reply to 9 [source]
Neither is preferable over the other and each has a different set of optimizations that are applied and therefore have different implementation methods and perhaps results. EXPLAIN and EXPLAIN QUERY PLAN are your friends that tell you exactly and precisely what is happening.
(8) By Keith Medcalf (kmedcalf) on 2022-12-13 05:36:57 in reply to 6 [link] [source]
And no, the equivalent of:
with fixes as
(
select indexInter,
code
from testUpdate
where indexInter > 2
)
update testUpdate
set code = fixes.code
from fixes
where fixes.indexInter+1 = testUpdate.indexInter
and testUpdate.indexInter > 3
returning *;
would be:
update testUpdate
set code = fixes.code
from (
select indexInter,
code
from testUpdate
where indexInter > 2
) fixes
where fixes.indexInter+1 = testUpdate.indexInter
and testUpdate.indexINter > 3
returning *;
Note that it is perfectly clear that you need to compute "fixes" before applying the updates in both of these cases.