Is there a way to optimize this UPDATE command to run faster?

(1.2) By PazO (JohnSmith) on 2021-05-27 19:10:38 edited from 1.1 [link]

Hello, 

Assuming I have the following table:

<code>
 CREATE TABLE SnapshotParameters(
     snapshot_id   integer,
     object_id     integer,
     param_id      integer,
     param_value   integer
 );
 </code>


This table contains about 24K records per one snapshot.<br>
Assuming I want to copy values from one snapshot to the other and using the following command:

<code>
UPDATE	SnapshotParameters AS dst
            SET     	param_value=src.param_value FROM SnapshotParameters AS src
            WHERE	dst.object_id==src.object_id  AND
                        dst.param_id==src.param_id    AND
                        dst.snapshot_id==3            AND
                        src.snapshot_id==5;
</code>
				
This command works fine but the problem is that it takes ~3.5 seconds to complete.

Is there a way to optimize this UPDATE command to run faster?

Many thanks,

PazO

(2) By Harald Hanche-Olsen (hanche) on 2021-05-27 14:21:38 in reply to 1.0 [link]

I have no idea, but would just point out that the table lacks the `chainer_id` column that appears in your query.

(3) By PazO (JohnSmith) on 2021-05-27 14:37:17 in reply to 2 [link]

Yes. Fixed this now (was suppose to be 'object_id')

(5) By anonymous on 2021-05-27 16:23:46 in reply to 3 [link]

Can you try with a preceeding WITH clause? It can also be forced to materialize. Maybe this can help.

(4) By David Raymond (dvdraymond) on 2021-05-27 16:21:22 in reply to 1.1 [link]

param_id doesn't seem to be showing up anywhere in your update query. Is that right, or should there also be a bit saying the param_id's are equal? If you've got more than one param_id per object_id then you're potentially messing things up I believe.

(7) By PazO (JohnSmith) on 2021-05-27 17:30:41 in reply to 4 [link]

Yes, you are correct, I forgot to add this:

`AND dst.param_id==src.param_id`

(6) By Bill Wade (billwade) on 2021-05-27 16:36:29 in reply to 1.1

You'd expect faster results with an index on snapshot_id or object_id (preferably whichever of those has fewer duplicates).

One could imagine the current plan is roughly

- Look at every record until you find snapshot_id==3
-- Look at every record until you find matching object_id and snapshot_id==5. Update dst if/when you find the match.

That outer loop executes 24k times, and the inner loop might execute 24k times for each call of the outer loop, for a total of 24k*24k = half a billion times.

If there are very few duplicate object_id's, an index on object_id substantially reduces the work for the inner loop.

If there are very few duplicate snapshot_id, an index there substantially reduces the work for both loops.

(8) By Keith Medcalf (kmedcalf) on 2021-05-27 17:55:31 in reply to 1.1 [link]

Create a unique index on the combination of columns in the join (snapshot_id, object_id) in that order so that the processing can find all the rows which have snapshot == 5 and use the (3, object_id) to find the row to update.  Without that index one or the other (or both) devolve into a table scans.

(9) By PazO (JohnSmith) on 2021-05-27 19:33:23 in reply to 8 [link]

Thank you for the reply.

In my table I have the following indexes:

 * Primary Key: (snapshot_id, object_id, param_id)
 * Normal: (snapshot_id, object_id)
 * Normal: (snapshot_id)

Please note that (snapshot_id, object_id) cannot be unique as a single object in a snapshot has ~200 parameters.

Another solution I considered is value-based:<br>
I found out that about 80% of our parameters are holding the value zero (the data is settings of audio knobs and controls, and the value zero is very popular). I considered the solution of not storing zero in the DB. It adds some logic mechanism but it is solve-able. This can significantly speed up operations done on [groups of] snapshots: merge, copy, paste, etc.

(10) By Richard Damon (RichardDamon) on 2021-05-28 00:46:35 in reply to 9 [link]

I don't think you need the second two indexes, as the first index can do everything that the others can. Maybe they make lookups a bit faster as they are smaller, but need to be updated for every new insertion slowing that down.

(11) By Bernardo Ramos (kroggen) on 2021-06-06 18:56:14 in reply to 10 [link]

I agree. And we can go even further: make the table WITHOUT ROWID, so the engine will use a single Btree

 CREATE TABLE SnapshotParameters(
     snapshot_id   integer,
     object_id     integer,
     param_id      integer,
     param_value   integer,
     PRIMARY KEY(snapshot_id, object_id, param_id)
 ) WITHOUT ROWID
 
 No external indexes needed

(12) By Bernardo Ramos (kroggen) on 2021-06-06 18:57:15 in reply to 11 [link]

I agree. And we can go even further: make the table WITHOUT ROWID, so the engine will use a single Btree
```
 CREATE TABLE SnapshotParameters(
     snapshot_id   integer,
     object_id     integer,
     param_id      integer,
     param_value   integer,
     PRIMARY KEY(snapshot_id, object_id, param_id)
 ) WITHOUT ROWID
 ```
 No external indexes needed

(13) By Bernardo Ramos (kroggen) on 2021-06-06 19:00:58 in reply to 12 [link]

By the way, the Delete button (to delete my post) is not working on this forum

(14) By Larry Brasfield (larrybr) on 2021-06-06 21:31:23 in reply to 13 [link]

After some period of benign participation in this forum, non-anonymous posters are taken out of the subject-to-moderation category. Then, deletion (which is a form of editing) is allowed. Patience should suffice.