Efficiency of updates versus replace when using the JSON1 extension functions.

(1) By Gary (1codedebugger) on 2020-11-05 05:28:40 [link] [source]

I'm trying to understand what takes place when a JSON1 extension function is used to update a property in a JSON string stored in a data table, compared to simply overwriting the row.

The reason I am asking is that I'm moving a web extension's storage from the indexedDB API to a C utility using SQLite3 and the browser's native-messaging API.

Because of the manner in which the garbage collector did not release RAM until a certain threshold had been consumed, it was more efficient to hold a certain amount of data in RAM and use it to overwrite the existing database records through a put operation only, as opposed to a get/put combination, in which the record was first retrieved, then de-serialized, updated, serialized, and put back in the database.

Would something similar be true in SQLite using the JSON functions for an update, such that, to update some properties of the string, the record would be selected, parsed, updated, and written back; and just replacing the full entry would be less work?

If true, then there appears to be two options: 1) pass the complete new JSON string from the browser to the utility and perform a less-intensive SQLite operation, or 2) pass only the data that needs updated and perform a more-intensive SQLite operation.

Is this an accurate view of the SQLite side of the process?

Thank you.

(2) By Keith Medcalf (kmedcalf) on 2020-11-05 06:13:10 in reply to 1 [source]

You are safe to assume that when you update something the entire record must be retrieved and a new entire record must be composed and that the entire record is re-written.

That is to say that if you have a record with 47 fields and you update one of them, then the entire record of 47 fields must be retrieved, the one you want to update is updated, then the entire record of 47 fields is re-composed into a record structure, and that record is re-written.

There are optimizations which may occur to prevent pages on which the content has not changed from being re-written, but this applies ONLY if the entire page after image is exactly the same as the entire page before image.

There are also optimizations which may take place to minimize the "shuffling" of data in and out of overflow -- however you should assume that making a change to a part of a record will require I/O of the whole thing.

So updating something within a JSON text string means:

retrieve the entire record
parse the JSON string*
fiddle with the JSON string*
recompose the record
write the updated record

The fiddle-faddling with the contents of the JSON string (the parts marked with the *) will occur no matter how you slice it -- it just depends whether you want that fiddling about to occur within the SQLite3 library or within your code. In either case, if the content of a field changes the entire record must be retrieved and recomposed and rewritten, no matter the "method" by which you "change the field".

All I/O is by page so if data on a page is needed, then that page must be read (if it is not already in the connection page cache). If a page "changes" (becomes after a change different than it was before the change) then the entire page will be written from the connection page cache to disk. Note that transaction logging may require that the "pre-change image" and/or the "after change image" of a page may need multiple reads and writes if a page contains changed data.

(4) By Gary (1codedebugger) on 2020-11-05 07:25:20 in reply to 2 [link] [source]

Thank you for the thorough explanation. I think I understand.

I thought it might be clear that one method would be quicker than the other, as it is for an indexedDB put operation on the entire string versus a get/put combination for an update; but I guess it isn't unless both methods are tested on the type of data the application will use.

At first, it would appear that the parsing, fiddling, and recomposing would consume a considerable portion of the total time required to complete the process, such that passing a complete string to replace the existing one would be quicker. But these steps take place in C and have to be compared to that which takes in the browser of stringifying a larger object from RAM and passing it via the native-messaging API to the C application.

I don't know enough about how the browser accomplishes these between C/C++ and JS.

My guess is replacing the entire string would be quicker, but my guess isn't worth much and the testing will be interesting.

Thanks again.

(3.1) By Gunter Hick (gunter_hick) on 2020-11-05 07:23:45 edited from 3.0 in reply to 1 [link] [source]

SQlite implements update rather like INSERT INTO ... SELECT, only without spurious changing of rowids and avoiding anomalies from updating fields that change the visitation order of records.

e.g.: create table t (id integer primary key, a integer, b jsontext, c real);

then running

update t set b = json_set(b,...) where ...;

will execute like

create temp table new_t as select id,a,json_set(b,...),c from t where ...; replace into t select * from new_t; drop table new_t;

Doing the same would require preparing two statements

select id,b from t where ...; update t set b = ? where id = ?;

and hoping that the visitation order is not based on field b.

(5) By Gary (1codedebugger) on 2020-11-05 21:21:24 in reply to 3.1 [link] [source]

Thank you for the explanation.

If the table is only two columns, such as create table t (id integer primary key, b jsontext ); and the json string in column b is to be completely replaced without the use of any JSON1 extension functions, will the existing record be deleted and the new one added without the need for table new_t?

(6) By Gunter Hick (gunter_hick) on 2020-11-06 07:22:04 in reply to 5 [link] [source]

UPDATE will never change the rowid, unless the rowid is mentioned in the SET field list. The number of fields int the table is irrelevant. SQLite may optimize the internal table away if the WHERE clause can be shown to limit the update to a single row, i.e. all the fields of any unique key are limited by an equality constraint.

UPDATE t SET b = ? WHERE id = ?;

is an example.