SQLite User Forum

Inconsistent values in trigger from UPSERT
Login

Inconsistent values in trigger from UPSERT

(1) By anonymous on 2024-04-11 20:39:45 [source]

Let's say I have a table of some redundant BLOBs and now I want to deduplicate them. I create a new table t2 with a unique hash field and a trigger to detect hash collisions, however unlikely they may be:

CREATE TABLE t1 (id INTEGER PRIMARY KEY, hash BLOB NOT NULL, data BLOB NOT NULL);
INSERT INTO t1 VALUES(1,unhex(format('31%.62c', '0')),format('%.36760c', 'a'));
INSERT INTO t1 VALUES(2,unhex(format('31%.62c', '0')),format('%.36760c', 'a'));
INSERT INTO t1 VALUES(3,unhex(format('32%.62c', '0')),format('%.36760c', 'b'));
INSERT INTO t1 VALUES(4,unhex(format('32%.62c', '0')),format('%.36760c', 'b'));

CREATE TABLE t2 (id INTEGER PRIMARY KEY, hash BLOB UNIQUE NOT NULL, data BLOB NOT NULL);
CREATE TABLE collisions (oldhash, newhash, old, new);
CREATE TRIGGER t2_collision BEFORE UPDATE OF data ON t2
WHEN old.hash = new.hash AND old.data != new.data
BEGIN
  INSERT INTO collisions (oldhash, newhash, old, new) VALUES (old.hash, new.hash, old.data, new.data);
END;

Now I try to move the data to the new table:

INSERT INTO t2 (hash, data) SELECT hash, data FROM t1 WHERE true ON CONFLICT (hash) DO UPDATE SET data = excluded.data;
SELECT * FROM collisions;

Surprisingly, I notice that while the data has been moved, there is a single collision: the old field consists of 4008 a characters followed by 32752 b characters (no such value is present in the original t1 table), while the new field is 36760 b characters, and both hash fields contain hashes for the new value.

I've tested this on fiddle (3.46.0 2024-04-08 11:50:07) and my local build (3.45.1). The counts are slightly different on my local build (there's 451 bs instead of 4008) but the behavior is otherwise consistent.

I'm not sure why this is happening. Is it expected behavior? Am I misunderstanding something?

(2.1) By Richard Hipp (drh) on 2024-04-11 23:42:19 edited from 2.0 in reply to 1 [link] [source]

UPSERT is implements as a special kind of transient TRIGGER. So in order for an UPSERT to fire a TRIGGER, you have to say:

PRAGMA recursive_triggers=ON;

Never mind the above. I was doing something wrong, apparently...

(3) By anonymous on 2024-04-11 23:56:28 in reply to 2.1 [link] [source]

Just in case my initial post wasn't clear: there are no collisions in the test data (t1) so the trigger should never fire, but according to the content of the collisions table it did fire once with the old.data field set to a value that wasn't present in any of the tables.

(4) By Richard Hipp (drh) on 2024-04-12 01:39:13 in reply to 3 [link] [source]

This seems to be a malfunction in UPSERT. The error goes back to version 3.24.0 (2018-06-04) when UPSERT was first added. I'm trying to isolate the cause of the problem now. Probably it will be fixed (on trunk at least) within a day or two.

(5) By Richard Hipp (drh) on 2024-04-12 15:53:16 in reply to 1 [link] [source]

This problem should now be resolved on the latest trunk check-ins and on the latest check-ins for branch-3.45.

(6) By anonymous on 2024-04-12 16:26:57 in reply to 5 [link] [source]

Thank you for the quick fix. I can confirm the issue no longer occurs on the test data and my actual DB.