SQLite User Forum

Novice question concerning why a select query provides desired results but similar update query does not?
Login

Novice question concerning why a select query provides desired results but similar update query does not?

(1) By Gary (1codedebugger) on 2022-12-13 01:12:01 [link] [source]

I'm trying to update a table to correct an error in the data that was loaded into SQLite. This is an example of the issue.

A code was skipped in arranging the data to be loaded, such that, beginning at indexInter = 4, the column "code" is one row out-of-sync (behind) with the rest of the data in each row. If I run the example select query below, the results are what is needed; but, when that query is modified to an update, all rows are updated to the first matching row.

Woudl you please tell me what novice error I am making and fundamental principle I am overlooking? Thank you.

sqlite> .dump testUpdate
PRAGMA foreign_keys=OFF;
BEGIN TRANSACTION;
CREATE TABLE testUpdate (indexRow integer unique, indexInter integer, code text);
INSERT INTO testUpdate VALUES(130120390150,1,'H1961');
INSERT INTO testUpdate VALUES(130120390250,2,'H8033');
INSERT INTO testUpdate VALUES(130120390350,3,'H1732');
INSERT INTO testUpdate VALUES(130120390450,4,'H3117');
INSERT INTO testUpdate VALUES(130120390470,NULL,'punct3');
INSERT INTO testUpdate VALUES(130120390550,5,'H7969');
INSERT INTO testUpdate VALUES(130120390650,6,'H398');
INSERT INTO testUpdate VALUES(130120390750,7,'H8354');
INSERT INTO testUpdate VALUES(130120390770,NULL,'punct2');
INSERT INTO testUpdate VALUES(130120390850,8,'');
INSERT INTO testUpdate VALUES(130120390950,9,'H3559');
INSERT INTO testUpdate VALUES(130120391050,10,'');
INSERT INTO testUpdate VALUES(130120391150,11,'H251');
INSERT INTO testUpdate VALUES(130120391250,12,'');
COMMIT;

sqlite> with fixes as 
  (select indexInter, code
   from testUpdate
   where indexInter > 2)
  select indexRow, indexInter, code,
      (select code from fixes where fixes.indexInter+1 = testUpdate.indexInter) as edit 
  from testUpdate;
indexRow      indexInter  code    edit 
------------  ----------  ------  -----
130120390150  1           H1961        
130120390250  2           H8033        
130120390350  3           H1732        
130120390450  4           H3117   H1732
130120390470              punct3       
130120390550  5           H7969   H3117
130120390650  6           H398    H7969
130120390750  7           H8354   H398 
130120390770              punct2       
130120390850  8                   H8354
130120390950  9           H3559        
130120391050  10                  H3559
130120391150  11          H251         
130120391250  12                  H251 


sqlite> begin transaction;
sqlite> with fixes as 
  (select indexInter, code
   from testUpdate
   where indexInter > 2)
  update testUpdate
  set code = (select code
              from fixes
              where fixes.indexInter+1 = testUpdate.indexInter)
  where indexInter > 3 returning *;
indexRow      indexInter  code 
------------  ----------  -----
130120390450  4           H1732
130120390550  5           H1732
130120390650  6           H1732
130120390750  7           H1732
130120390850  8           H1732
130120390950  9           H1732
130120391050  10          H1732
130120391150  11          H1732
130120391250  12          H1732
sqlite> rollback;

(2) By Gary (1codedebugger) on 2022-12-13 01:19:00 in reply to 1 [link] [source]

Got it with this but why doesn't it work the first way?

sqlite> begin transaction;
sqlite> with fixes as 
  (select indexInter, code
   from testUpdate where indexInter > 2)
  update testUpdate
  set code = fixes.code
  from fixes
  where fixes.indexInter+1 = testUpdate.indexInter
   and testUpdate.indexInter > 3
 returning *;
indexRow      indexInter  code 
------------  ----------  -----
130120390450  4           H1732
130120390550  5           H3117
130120390650  6           H7969
130120390750  7           H398 
130120390850  8           H8354
130120390950  9                
130120391050  10          H3559
130120391150  11               
130120391250  12          H251 
sqlite> select * from testUpdate;
indexRow      indexInter  code  
------------  ----------  ------
130120390150  1           H1961 
130120390250  2           H8033 
130120390350  3           H1732 
130120390450  4           H1732 
130120390470              punct3
130120390550  5           H3117 
130120390650  6           H7969 
130120390750  7           H398  
130120390770              punct2
130120390850  8           H8354 
130120390950  9                 
130120391050  10          H3559 
130120391150  11                
130120391250  12          H251  

(3.1) By Keith Medcalf (kmedcalf) on 2022-12-13 02:32:20 edited from 3.0 in reply to 2 [link] [source]

Deleted

(4) By Keith Medcalf (kmedcalf) on 2022-12-13 02:32:11 in reply to 3.0 [link] [source]

Scrap that. WHat version are you using? The current tip version works properly if the temp table is materialized.

sqlite> with fixes as materialized
   ...>   (select indexInter, code
(x1...>    from testUpdate
(x1...>    where indexInter > 2)
   ...>   update testUpdate
   ...> set code = (select code
(x1...>               from fixes
(x1...>               where fixes.indexInter+1 = testUpdate.indexInter)
   ...>   where indexInter > 3 returning *;
QUERY PLAN
|--SCAN testUpdate (~983040 rows)
`--CORRELATED SCALAR SUBQUERY 2
   |--MATERIALIZE fixes
   |  `--SCAN testUpdate (~983040 rows)
   `--SCAN fixes (~245760 rows)
┌──────────────┬────────────┬─────────┐
│   indexRow   │ indexInter │  code   │
├──────────────┼────────────┼─────────┤
│ 130120390450 │ 4          │ 'H1732' │
│ 130120390550 │ 5          │ 'H3117' │
│ 130120390650 │ 6          │ 'H7969' │
│ 130120390750 │ 7          │ 'H398'  │
│ 130120390850 │ 8          │ 'H8354' │
│ 130120390950 │ 9          │ ''      │
│ 130120391050 │ 10         │ 'H3559' │
│ 130120391150 │ 11         │ ''      │
│ 130120391250 │ 12         │ 'H251'  │
└──────────────┴────────────┴─────────┘

But not if the CTE is not materialized.

sqlite> with fixes as
   ...>   (select indexInter, code
(x1...>    from testUpdate
(x1...>    where indexInter > 2)
   ...>   update testUpdate
   ...>   set code = (select code
(x1...>               from fixes
(x1...>               where fixes.indexInter+1 = testUpdate.indexInter)
   ...>   where indexInter > 3 returning *;
QUERY PLAN
|--SCAN testUpdate (~983040 rows)
`--CORRELATED SCALAR SUBQUERY 2
   `--SCAN testUpdate (~262144 rows)
┌──────────────┬────────────┬─────────┐
│   indexRow   │ indexInter │  code   │
├──────────────┼────────────┼─────────┤
│ 130120390450 │ 4          │ 'H1732' │
│ 130120390550 │ 5          │ 'H1732' │
│ 130120390650 │ 6          │ 'H1732' │
│ 130120390750 │ 7          │ 'H1732' │
│ 130120390850 │ 8          │ 'H1732' │
│ 130120390950 │ 9          │ 'H1732' │
│ 130120391050 │ 10         │ 'H1732' │
│ 130120391150 │ 11         │ 'H1732' │
│ 130120391250 │ 12         │ 'H1732' │
└──────────────┴────────────┴─────────┘

contrast

sqlite> with fixes as
   ...>   (select indexInter, code
(x1...>    from testUpdate where indexInter > 2)
   ...>   update testUpdate
   ...>   set code = fixes.code
   ...>   from fixes
   ...>   where fixes.indexInter+1 = testUpdate.indexInter
   ...>    and testUpdate.indexInter > 3 returning *;
QUERY PLAN
|--SCAN testUpdate (~983040 rows)
`--SEARCH testUpdate USING AUTOMATIC PARTIAL COVERING INDEX (indexInter=?) (~20 rows)
┌──────────────┬────────────┬─────────┐
│   indexRow   │ indexInter │  code   │
├──────────────┼────────────┼─────────┤
│ 130120390450 │ 4          │ 'H1732' │
│ 130120390550 │ 5          │ 'H3117' │
│ 130120390650 │ 6          │ 'H7969' │
│ 130120390750 │ 7          │ 'H398'  │
│ 130120390850 │ 8          │ 'H8354' │
│ 130120390950 │ 9          │ ''      │
│ 130120391050 │ 10         │ 'H3559' │
│ 130120391150 │ 11         │ ''      │
│ 130120391250 │ 12         │ 'H251'  │
└──────────────┴────────────┴─────────┘

You will note that when using UPDATE FROM the CTE is scanned AND THEN the updates are applied. Same when you materialize the CTE. However, when the CTE is not materialized, the update confounds itself.

(5.2) By Keith Medcalf (kmedcalf) on 2022-12-13 02:52:12 edited from 5.1 in reply to 4 [link] [source]

In other words, when the query is:

sqlite> with fixes as
   ...>   (select indexInter, code
(x1...>    from testUpdate
(x1...>    where indexInter > 2)
   ...>   update testUpdate
   ...>   set code = (select code
(x1...>               from fixes
(x1...>               where fixes.indexInter+1 = testUpdate.indexInter)
   ...>   where indexInter > 3 returning *;

what you are really doing is the following:

update testUpdate as o
   set code = (select code
                 from testUpdate
                where indexInter = o.indexInter - 1
                  and indexInter > 2
              )
 where indexInter > 3 returning *;

which, for each row in testUpdate where indexInter > 3 goes and looks up the value of code in testupdate where the indexInter is one less than the row being updated, and applies the update. Then go to the next row. Lather and repeat. THis means that the value of code is taken at each step from the previous indexInter (which was just updated with the value from the pervious indexInter (which was just updated from the row with the previous indexInter (...)))

This means that the single first replacement value is propagated to all updated records. WHat exactly happens depends, of course, on the available indexes, the traversal order chosen, the physical order of the records, etc.

In short, compute the updates AND THEN apply them. Do not do it all at once if the are all coming from the same table.

(6) By Gary (1codedebugger) on 2022-12-13 04:26:37 in reply to 5.2 [link] [source]

Thank you for the explanation. I think I understand what you're saying and how it fits the results generated; but I could not explain to anyone why the first update seems to ignore the CTE fixes in the subquery. Nonetheless, it's not important; the important item is that if I want to get correct results, I need to write the query in the proper manner. It seemed like a bad idea to have a subquery for every row needing an update but, at that point, I was more concerned with getting the data corrected than being proficient in SQL.

The version is:

sqlite> .version
SQLite 3.40.0 2022-11-16 12:10:08 89c459e766ea7e9165d0beeb124708b955a4950d0f4792f457465d71b158alt1
zlib version 1.2.13
gcc-12.2.0
Just out of curiosity, why does the query plan mention so many rows when there are 14 in the table?
QUERY PLAN
|--SCAN testUpdate (~983040 rows)
`--CORRELATED SCALAR SUBQUERY 2
   `--SCAN testUpdate (~262144 rows)
If you don't mind another question: I was reading about what materialized means here at https://www.sqlite.org/lang_with.html under the heading "Materialization Hints" and it reads:

If the NOT MATERIALIZED phrase is used, then select-stmt is substituted as a subquery in place of every occurrence of the CTE table name. ... If neither hint is present, then SQLite 3.35.0 (2021-03-12) and later process the CTE as if ... the NOT MATERIALIZED phrase were present if the CTE is used only once.

Does this mean that

update testUpdate
set code = (select code
from fixes
where fixes.indexInter+1 = testUpdate.indexInter)
where indexInter > 3 returning *;
becomes
update testUpdate
set code = (select code
from (select indexInter, code
      from testUpdate
      where indexInter > 2) fixes
where fixes.indexInter+1 = testUpdate.indexInter)
where indexInter > 3 returning *;

If so, it would appear a bit confusing to determine which code and indexInter is to be used. I tried to run it as a select and it returns ambiguous column errors in multiple places. And, if I mess with it to corect that, it appears that the right side of "set code = " returns a list rather than a single value; and, perhaps, it picks the first row in the list, which here is H1732. But I'm speaking far beyond my skill and experience here.

(7) By Keith Medcalf (kmedcalf) on 2022-12-13 05:27:54 in reply to 6 [link] [source]

It seemed like a bad idea to have a subquery for every row needing an update

That is not a problem. Physics is the problem. THe update statement processes the rows selected by the UPDATE ... WHERE ... one after each that pass the WHERE conditions. When each "row" to be "updated" is found, the correlated subquery is run. This subquery runs against the very same table that is the target of the update. After the update is made (and is now visible to the update command itself) the update moves to the next row to be updated. It then goes and looks up the update value. By happenstance, the value it finds is the very one that it just updated. So it writes that value into the row being updated, and moves to the next row. THis process repeats, cascading the update from row to row (isolation only applies between connections, not on the same connection). Hence all the rows are updated to the same value.

Just out of curiosity, why does the query plan mention so many rows when there are 14 in the table?

ANALYZE has not been run so there are no statitics available and since no useful indexes are being used, the number of rows cannot be extimated with any accuracy. This means that default assumptions as to number of rows and selectivity is used.

If so, it would appear a bit confusing to determine which code and indexInter is to be used

Not at all. References in the projection refer preferentially to the tables being selected from. That is the subquery (select ... from testupdate ...) all references to bare columns (without a schema table referent) refer to the column in the table being selected from. THe only ambiguous name is the final indexInter wollowing the word WHERE because that name exists in BOTH referenced tables (the testUpdate being updated and the fixes table from the subquery projection).

However, the queries are not equivalent as you can tell from how they run and if you turn on eqp (.eqp on).

(9) By Gary (1codedebugger) on 2022-12-13 06:20:46 in reply to 7 [link] [source]

This subquery runs against the very same table that is the target of the update.

Thanks. I should've understood that the first time you explained it. In the first version, the CTE "fixes", since it is not materialized, is being run as a subquery for each row in the update query, and each run is against the newly updated version of the table being updated; and, therefore, it keeps "passing up(ward)" the same code column value from the indexInter row in the very first run of the fixes subquery. Now I understand.

May I ask two more question please? 1) Is UPDATE FROM preferrable to the CTE materialized with the set code = to a subquery? And, 2) unless one is aiming for one of those optimizations mentioned in the Materialization Hints in the documentation on WITH, is it best/safer to always use materialize in a CTE? Thanks.

(10) By Keith Medcalf (kmedcalf) on 2022-12-13 14:47:02 in reply to 9 [source]

Neither is preferable over the other and each has a different set of optimizations that are applied and therefore have different implementation methods and perhaps results. EXPLAIN and EXPLAIN QUERY PLAN are your friends that tell you exactly and precisely what is happening.

(8) By Keith Medcalf (kmedcalf) on 2022-12-13 05:36:57 in reply to 6 [link] [source]

And no, the equivalent of:

     with fixes as
          (
           select indexInter, 
                  code
             from testUpdate 
            where indexInter > 2
          )
   update testUpdate
      set code = fixes.code
     from fixes
    where fixes.indexInter+1 = testUpdate.indexInter
      and testUpdate.indexInter > 3 
returning *;

would be:

  update testUpdate
     set code = fixes.code
    from (
          select indexInter,
                 code
            from testUpdate
           where indexInter > 2
         ) fixes
    where fixes.indexInter+1 = testUpdate.indexInter
      and testUpdate.indexINter > 3
returning *;

Note that it is perfectly clear that you need to compute "fixes" before applying the updates in both of these cases.