Is there a way to optimize this UPDATE command to run faster?
(1.2) By PazO (JohnSmith) on 2021-05-27 19:10:38 edited from 1.1 [link] [source]
Hello,
Assuming I have the following table:
CREATE TABLE SnapshotParameters(
snapshot_id integer,
object_id integer,
param_id integer,
param_value integer
);
This table contains about 24K records per one snapshot.
Assuming I want to copy values from one snapshot to the other and using the following command:
UPDATE SnapshotParameters AS dst
SET param_value=src.param_value FROM SnapshotParameters AS src
WHERE dst.object_id==src.object_id AND
dst.param_id==src.param_id AND
dst.snapshot_id==3 AND
src.snapshot_id==5;
This command works fine but the problem is that it takes ~3.5 seconds to complete.
Is there a way to optimize this UPDATE command to run faster?
Many thanks,
PazO
(2) By Harald Hanche-Olsen (hanche) on 2021-05-27 14:21:38 in reply to 1.0 [link] [source]
I have no idea, but would just point out that the table lacks the chainer_id
column that appears in your query.
(3) By PazO (JohnSmith) on 2021-05-27 14:37:17 in reply to 2 [source]
Yes. Fixed this now (was suppose to be 'object_id')
(5) By anonymous on 2021-05-27 16:23:46 in reply to 3 [link] [source]
Can you try with a preceeding WITH clause? It can also be forced to materialize. Maybe this can help.
(4) By David Raymond (dvdraymond) on 2021-05-27 16:21:22 in reply to 1.1 [link] [source]
param_id doesn't seem to be showing up anywhere in your update query. Is that right, or should there also be a bit saying the param_id's are equal? If you've got more than one param_id per object_id then you're potentially messing things up I believe.
(7) By PazO (JohnSmith) on 2021-05-27 17:30:41 in reply to 4 [link] [source]
Yes, you are correct, I forgot to add this:
AND dst.param_id==src.param_id
(6) By Bill Wade (billwade) on 2021-05-27 16:36:29 in reply to 1.1 [link] [source]
You'd expect faster results with an index on snapshot_id or object_id (preferably whichever of those has fewer duplicates).
One could imagine the current plan is roughly
- Look at every record until you find snapshot_id==3 -- Look at every record until you find matching object_id and snapshot_id==5. Update dst if/when you find the match.
That outer loop executes 24k times, and the inner loop might execute 24k times for each call of the outer loop, for a total of 24k*24k = half a billion times.
If there are very few duplicate object_id's, an index on object_id substantially reduces the work for the inner loop.
If there are very few duplicate snapshot_id, an index there substantially reduces the work for both loops.
(8) By Keith Medcalf (kmedcalf) on 2021-05-27 17:55:31 in reply to 1.1 [link] [source]
Create a unique index on the combination of columns in the join (snapshot_id, object_id) in that order so that the processing can find all the rows which have snapshot == 5 and use the (3, object_id) to find the row to update. Without that index one or the other (or both) devolve into a table scans.
(9) By PazO (JohnSmith) on 2021-05-27 19:33:23 in reply to 8 [link] [source]
Thank you for the reply.
In my table I have the following indexes:
- Primary Key: (snapshot_id, object_id, param_id)
- Normal: (snapshot_id, object_id)
- Normal: (snapshot_id)
Please note that (snapshot_id, object_id) cannot be unique as a single object in a snapshot has ~200 parameters.
Another solution I considered is value-based:
I found out that about 80% of our parameters are holding the value zero (the data is settings of audio knobs and controls, and the value zero is very popular). I considered the solution of not storing zero in the DB. It adds some logic mechanism but it is solve-able. This can significantly speed up operations done on [groups of] snapshots: merge, copy, paste, etc.
(10) By Richard Damon (RichardDamon) on 2021-05-28 00:46:35 in reply to 9 [link] [source]
I don't think you need the second two indexes, as the first index can do everything that the others can. Maybe they make lookups a bit faster as they are smaller, but need to be updated for every new insertion slowing that down.
(11) By Bernardo Ramos (kroggen) on 2021-06-06 18:56:14 in reply to 10 [link] [source]
I agree. And we can go even further: make the table WITHOUT ROWID, so the engine will use a single Btree
CREATE TABLE SnapshotParameters( snapshot_id integer, object_id integer, param_id integer, param_value integer, PRIMARY KEY(snapshot_id, object_id, param_id) ) WITHOUT ROWID
No external indexes needed
(12) By Bernardo Ramos (kroggen) on 2021-06-06 18:57:15 in reply to 11 [link] [source]
I agree. And we can go even further: make the table WITHOUT ROWID, so the engine will use a single Btree
CREATE TABLE SnapshotParameters(
snapshot_id integer,
object_id integer,
param_id integer,
param_value integer,
PRIMARY KEY(snapshot_id, object_id, param_id)
) WITHOUT ROWID
No external indexes needed
(13) By Bernardo Ramos (kroggen) on 2021-06-06 19:00:58 in reply to 12 [link] [source]
By the way, the Delete button (to delete my post) is not working on this forum
(14) By Larry Brasfield (larrybr) on 2021-06-06 21:31:23 in reply to 13 [link] [source]
After some period of benign participation in this forum, non-anonymous posters are taken out of the subject-to-moderation category. Then, deletion (which is a form of editing) is allowed. Patience should suffice.