Is there a way to optimize this UPDATE command to run faster?
(1.2) By PazO (JohnSmith) on 2021-05-27 19:10:38 edited from 1.1 [link]
Hello, Assuming I have the following table: <code> CREATE TABLE SnapshotParameters( snapshot_id integer, object_id integer, param_id integer, param_value integer ); </code> This table contains about 24K records per one snapshot.<br> Assuming I want to copy values from one snapshot to the other and using the following command: <code> UPDATE SnapshotParameters AS dst SET param_value=src.param_value FROM SnapshotParameters AS src WHERE dst.object_id==src.object_id AND dst.param_id==src.param_id AND dst.snapshot_id==3 AND src.snapshot_id==5; </code> This command works fine but the problem is that it takes ~3.5 seconds to complete. Is there a way to optimize this UPDATE command to run faster? Many thanks, PazO
(2) By Harald Hanche-Olsen (hanche) on 2021-05-27 14:21:38 in reply to 1.0 [link]
I have no idea, but would just point out that the table lacks the `chainer_id` column that appears in your query.
(3) By PazO (JohnSmith) on 2021-05-27 14:37:17 in reply to 2 [link]
Yes. Fixed this now (was suppose to be 'object_id')
(5) By anonymous on 2021-05-27 16:23:46 in reply to 3 [link]
Can you try with a preceeding WITH clause? It can also be forced to materialize. Maybe this can help.
(4) By David Raymond (dvdraymond) on 2021-05-27 16:21:22 in reply to 1.1 [link]
param_id doesn't seem to be showing up anywhere in your update query. Is that right, or should there also be a bit saying the param_id's are equal? If you've got more than one param_id per object_id then you're potentially messing things up I believe.
(7) By PazO (JohnSmith) on 2021-05-27 17:30:41 in reply to 4 [link]
Yes, you are correct, I forgot to add this: `AND dst.param_id==src.param_id`
(6) By Bill Wade (billwade) on 2021-05-27 16:36:29 in reply to 1.1
You'd expect faster results with an index on snapshot_id or object_id (preferably whichever of those has fewer duplicates). One could imagine the current plan is roughly - Look at every record until you find snapshot_id==3 -- Look at every record until you find matching object_id and snapshot_id==5. Update dst if/when you find the match. That outer loop executes 24k times, and the inner loop might execute 24k times for each call of the outer loop, for a total of 24k*24k = half a billion times. If there are very few duplicate object_id's, an index on object_id substantially reduces the work for the inner loop. If there are very few duplicate snapshot_id, an index there substantially reduces the work for both loops.
(8) By Keith Medcalf (kmedcalf) on 2021-05-27 17:55:31 in reply to 1.1 [link]
Create a unique index on the combination of columns in the join (snapshot_id, object_id) in that order so that the processing can find all the rows which have snapshot == 5 and use the (3, object_id) to find the row to update. Without that index one or the other (or both) devolve into a table scans.
(9) By PazO (JohnSmith) on 2021-05-27 19:33:23 in reply to 8 [link]
Thank you for the reply. In my table I have the following indexes: * Primary Key: (snapshot_id, object_id, param_id) * Normal: (snapshot_id, object_id) * Normal: (snapshot_id) Please note that (snapshot_id, object_id) cannot be unique as a single object in a snapshot has ~200 parameters. Another solution I considered is value-based:<br> I found out that about 80% of our parameters are holding the value zero (the data is settings of audio knobs and controls, and the value zero is very popular). I considered the solution of not storing zero in the DB. It adds some logic mechanism but it is solve-able. This can significantly speed up operations done on [groups of] snapshots: merge, copy, paste, etc.
(10) By Richard Damon (RichardDamon) on 2021-05-28 00:46:35 in reply to 9 [link]
I don't think you need the second two indexes, as the first index can do everything that the others can. Maybe they make lookups a bit faster as they are smaller, but need to be updated for every new insertion slowing that down.
(11) By Bernardo Ramos (kroggen) on 2021-06-06 18:56:14 in reply to 10 [link]
I agree. And we can go even further: make the table WITHOUT ROWID, so the engine will use a single Btree CREATE TABLE SnapshotParameters( snapshot_id integer, object_id integer, param_id integer, param_value integer, PRIMARY KEY(snapshot_id, object_id, param_id) ) WITHOUT ROWID No external indexes needed
(12) By Bernardo Ramos (kroggen) on 2021-06-06 18:57:15 in reply to 11 [link]
I agree. And we can go even further: make the table WITHOUT ROWID, so the engine will use a single Btree ``` CREATE TABLE SnapshotParameters( snapshot_id integer, object_id integer, param_id integer, param_value integer, PRIMARY KEY(snapshot_id, object_id, param_id) ) WITHOUT ROWID ``` No external indexes needed
(13) By Bernardo Ramos (kroggen) on 2021-06-06 19:00:58 in reply to 12 [link]
By the way, the Delete button (to delete my post) is not working on this forum
(14) By Larry Brasfield (larrybr) on 2021-06-06 21:31:23 in reply to 13 [link]
After some period of benign participation in this forum, non-anonymous posters are taken out of the subject-to-moderation category. Then, deletion (which is a form of editing) is allowed. Patience should suffice.