Slow update table with large BLOB issue(solution)
(1) By codeservice on 2020-06-18 13:56:18 [link] [source]
I know best practice to work with large BLOB is to keep BLOB in second table and connect by key ID. Problem I have table, which is not created by me. I found way to update table with large BLOB fast, only not sure if this increasing updating speed result of other issue or acceptable method.
So table: CREATE TABLE IF NOT EXISTS Test ( id INTEGER PRIMARY KEY AUTOINCREMENT, name TEXT NOT NULL, modified DATETIME, contentLARGE BLOB )
First slow update method, because of contentLARGE BLOB: UPDATE Test SET name=?, modified=CURRENT_TIMESTAMP WHERE id=?
Second fast: WITH RECURSIVE tc( i ) AS ( select id from Test where id = ?) UPDATE Test SET name=?, modified=CURRENT_TIMESTAMP WHERE id in tc
My question why second method fast and first slow? Is it acceptable use second method for update and if so, why first direct update method not using similar technique?
(2) By Keith Medcalf (kmedcalf) on 2020-06-18 15:31:43 in reply to 1 [link] [source]
The second method must be slower than the first because it is executing approximately twice the number of VDBE instructions to accomplish the same task.
The only conclusion, therefore, is that your testing methodology to determine "fastness" is flawed.
(3) By codeservice on 2020-06-18 15:50:15 in reply to 2 [link] [source]
Eh, my mistake with checking second method "fastness" I updated "name" field with the same text length, as soon as I increase "name" content length, second method become slow too. It does look like SQLite reposition all fields, including large BLOB with every update even if it doesnt include BLOB.
(4) By Keith Medcalf (kmedcalf) on 2020-06-18 17:22:57 in reply to 3 [source]
That is correct.
Pretty much everything in the SQLite3 record format is variable length (that is, only REAL/IEEE 754 double precision floating point, are a fixed length).
However, if the "new" page content is the same as the "old" page content, then the page is not written (this is an internal optimization where only pages that actually change are written -- if the "before" and "after" page content is the same, there is no point in writing the change is there?)
So if you change the "length" of a record (cell) SQLite3 has to rewrite the entire cell including the overflow pages (where part of the data that will not fit on a page is stored) because the "page boundaries" have changed.
That means that if you update something in a record (cell) such that the "overflow pages" remain the same, then only the page containing the changed part of the record (cell) is re-written.
However as soon as you update the "length" of the record (cell) (ie, update the length of something inside a cell) the "overflow" pages (all data after that change) are now no longer at the same "offset" as they were and hence get overwritten.
This applies in all cases where the length of the record (cell) will not fit within a page. If you change the "boundary" then all the overflow pages must be re-written an well.
This is why you might want to store data in "long fields" (blobs and text strings) in a separate table so that updates do not cause the entire (could be quite long) record to have to be re-written.
Note that this is not necessarily only related to long fields, but any record (cell) that will not fit on a single page. If the boundary between what is stored on a page and what is stored on an "overflow" page changes (thus changing the data in the overflow pages), then all the associated pages must be updated as well.
Hopefully this description will help with understanding what is going on in the "internal implementation" of the default underlying storage layer and is an implementation detail.
For more complete documentation on the data storage format see https://sqlite.org/fileformat.html
(5) By codeservice on 2020-06-18 17:46:19 in reply to 4 [link] [source]
Thank you for clarification! This was a bit unexpected. Most DB types based on fixed field size declaration.