SQLite Forum

Is there a way to optimize this UPDATE command to run faster?
Login

Is there a way to optimize this UPDATE command to run faster?

(1.2) By PazO (JohnSmith) on 2021-05-27 19:10:38 edited from 1.1 [link] [source]

Hello,

Assuming I have the following table:

CREATE TABLE SnapshotParameters( snapshot_id integer, object_id integer, param_id integer, param_value integer );

This table contains about 24K records per one snapshot.
Assuming I want to copy values from one snapshot to the other and using the following command:

UPDATE SnapshotParameters AS dst SET param_value=src.param_value FROM SnapshotParameters AS src WHERE dst.object_id==src.object_id AND dst.param_id==src.param_id AND dst.snapshot_id==3 AND src.snapshot_id==5;

This command works fine but the problem is that it takes ~3.5 seconds to complete.

Is there a way to optimize this UPDATE command to run faster?

Many thanks,

PazO

(2) By Harald Hanche-Olsen (hanche) on 2021-05-27 14:21:38 in reply to 1.0 [link] [source]

I have no idea, but would just point out that the table lacks the chainer_id column that appears in your query.

(3) By PazO (JohnSmith) on 2021-05-27 14:37:17 in reply to 2 [link] [source]

Yes. Fixed this now (was suppose to be 'object_id')

(5) By anonymous on 2021-05-27 16:23:46 in reply to 3 [link] [source]

Can you try with a preceeding WITH clause? It can also be forced to materialize. Maybe this can help.

(4) By David Raymond (dvdraymond) on 2021-05-27 16:21:22 in reply to 1.1 [source]

param_id doesn't seem to be showing up anywhere in your update query. Is that right, or should there also be a bit saying the param_id's are equal? If you've got more than one param_id per object_id then you're potentially messing things up I believe.

(7) By PazO (JohnSmith) on 2021-05-27 17:30:41 in reply to 4 [link] [source]

Yes, you are correct, I forgot to add this:

AND dst.param_id==src.param_id

(6) By Bill Wade (billwade) on 2021-05-27 16:36:29 in reply to 1.1 [link] [source]

You'd expect faster results with an index on snapshot_id or object_id (preferably whichever of those has fewer duplicates).

One could imagine the current plan is roughly

  • Look at every record until you find snapshot_id==3 -- Look at every record until you find matching object_id and snapshot_id==5. Update dst if/when you find the match.

That outer loop executes 24k times, and the inner loop might execute 24k times for each call of the outer loop, for a total of 24k*24k = half a billion times.

If there are very few duplicate object_id's, an index on object_id substantially reduces the work for the inner loop.

If there are very few duplicate snapshot_id, an index there substantially reduces the work for both loops.

(8) By Keith Medcalf (kmedcalf) on 2021-05-27 17:55:31 in reply to 1.1 [link] [source]

Create a unique index on the combination of columns in the join (snapshot_id, object_id) in that order so that the processing can find all the rows which have snapshot == 5 and use the (3, object_id) to find the row to update. Without that index one or the other (or both) devolve into a table scans.

(9) By PazO (JohnSmith) on 2021-05-27 19:33:23 in reply to 8 [link] [source]

Thank you for the reply.

In my table I have the following indexes:

  • Primary Key: (snapshot_id, object_id, param_id)
  • Normal: (snapshot_id, object_id)
  • Normal: (snapshot_id)

Please note that (snapshot_id, object_id) cannot be unique as a single object in a snapshot has ~200 parameters.

Another solution I considered is value-based:
I found out that about 80% of our parameters are holding the value zero (the data is settings of audio knobs and controls, and the value zero is very popular). I considered the solution of not storing zero in the DB. It adds some logic mechanism but it is solve-able. This can significantly speed up operations done on [groups of] snapshots: merge, copy, paste, etc.

(10) By Richard Damon (RichardDamon) on 2021-05-28 00:46:35 in reply to 9 [link] [source]

I don't think you need the second two indexes, as the first index can do everything that the others can. Maybe they make lookups a bit faster as they are smaller, but need to be updated for every new insertion slowing that down.

(11) By Bernardo Ramos (kroggen) on 2021-06-06 18:56:14 in reply to 10 [link] [source]

I agree. And we can go even further: make the table WITHOUT ROWID, so the engine will use a single Btree

CREATE TABLE SnapshotParameters( snapshot_id integer, object_id integer, param_id integer, param_value integer, PRIMARY KEY(snapshot_id, object_id, param_id) ) WITHOUT ROWID

No external indexes needed

(12) By Bernardo Ramos (kroggen) on 2021-06-06 18:57:15 in reply to 11 [link] [source]

I agree. And we can go even further: make the table WITHOUT ROWID, so the engine will use a single Btree

 CREATE TABLE SnapshotParameters(
     snapshot_id   integer,
     object_id     integer,
     param_id      integer,
     param_value   integer,
     PRIMARY KEY(snapshot_id, object_id, param_id)
 ) WITHOUT ROWID
No external indexes needed

(13) By Bernardo Ramos (kroggen) on 2021-06-06 19:00:58 in reply to 12 [link] [source]

By the way, the Delete button (to delete my post) is not working on this forum

(14) By Larry Brasfield (larrybr) on 2021-06-06 21:31:23 in reply to 13 [link] [source]

After some period of benign participation in this forum, non-anonymous posters are taken out of the subject-to-moderation category. Then, deletion (which is a form of editing) is allowed. Patience should suffice.