Setting non-binary columns to null in a table with a binary column populated with data creates free space and a larger file

(1) By Marc (mbeeson1) on 2020-07-13 06:37:42 [source]

Consider the following table schema:

CREATE TABLE Data1;

If there is a row with the following data and the database is vacuumed: [Data] = <very large binary>, Items_id = 1, Key = "MyKey"

And we issue the following command: UPDATE Data1 SET Items_id = null, Key = null WHERE Items_id = 1;

The database file size on disk appears to double and the 'Pages on the freelist' is 50% when using sqlite3_analyzer in the summary output.

(2) By Marc (mbeeson1) on 2020-07-13 06:41:01 in reply to 1 [link] [source]

This was performed in SQLite Expert Professional 5.35.483 which is using SQLite version 3.32.3 [FTS3 FTS4 FTS5 RTREE ICU]

(3) By TripeHound on 2020-07-13 08:40:35 in reply to 1 [link] [source]

The schema in the above reads

CREATE TABLE [Data1](
  [Data] BLOB, 
  [Items_id] INTEGER, 
  [Key] TEXT);

(4) By Keith Medcalf (kmedcalf) on 2020-07-13 08:56:23 in reply to 1 [link] [source]

Yes, and ... ?

So it appears that the BLOB field part was re-written in the overflow pages even though the boundary split of the data between the on-page cell and the overflow was unchanged.

sqlite> create table x(x,y,z);
sqlite> insert into x values (zeroblob(1024*1024),1,'MyKey');
sqlite> select * from dbstat;
┌───────────────┬─────────────┬────────┬──────────┬───────┬─────────┬────────┬────────────┬──────────┬────────┐
│     name      │    path     │ pageno │ pagetype │ ncell │ payload │ unused │ mx_payload │ pgoffset │ pgsize │
├───────────────┼─────────────┼────────┼──────────┼───────┼─────────┼────────┼────────────┼──────────┼────────┤
│ sqlite_schema │ /           │ 1      │ leaf     │ 1     │ 35      │ 3949   │ 35         │ 0        │ 4096   │
│ x             │ /           │ 2      │ leaf     │ 1     │ 1036    │ 3042   │ 1048588    │ 4096     │ 4096   │
│ x             │ /000+000000 │ 3      │ overflow │ 0     │ 4092    │ 0      │ 0          │ 4096     │ 4096   │
│ x             │ /000+000001 │ 4      │ overflow │ 0     │ 4092    │ 0      │ 0          │ 8192     │ 4096   │
... overflow pages 000+000002 to 000+0000fd each containing 4092 bytes elided
│ x             │ /000+0000fe │ 257    │ overflow │ 0     │ 4092    │ 0      │ 0          │ 1044480  │ 4096   │
│ x             │ /000+0000ff │ 258    │ overflow │ 0     │ 4092    │ 0      │ 0          │ 1048576  │ 4096   │
└───────────────┴─────────────┴────────┴──────────┴───────┴─────────┴────────┴────────────┴──────────┴────────┘
sqlite> select 1024*1024 - 256*4092;
┌──────────────────────┐
│ 1024*1024 - 256*4092 │
├──────────────────────┤
│ 1024                 │
└──────────────────────┘
sqlite>

So it appears that 1024 bytes of the blob are stored in the cell in the table.

sqlite> update x set y=null, z=null where y=1;
sqlite> select * from dbstat;
┌───────────────┬─────────────┬────────┬──────────┬───────┬─────────┬────────┬────────────┬──────────┬────────┐
│     name      │    path     │ pageno │ pagetype │ ncell │ payload │ unused │ mx_payload │ pgoffset │ pgsize │
├───────────────┼─────────────┼────────┼──────────┼───────┼─────────┼────────┼────────────┼──────────┼────────┤
│ sqlite_schema │ /           │ 1      │ leaf     │ 1     │ 35      │ 3949   │ 35         │ 0        │ 4096   │
│ x             │ /           │ 2      │ leaf     │ 1     │ 1031    │ 3047   │ 1048583    │ 4096     │ 4096   │
│ x             │ /000+000000 │ 259    │ overflow │ 0     │ 4092    │ 0      │ 0          │ 4096     │ 4096   │
│ x             │ /000+000001 │ 260    │ overflow │ 0     │ 4092    │ 0      │ 0          │ 1056768  │ 4096   │
... overflow pages 000+000002 to 000+0000fd each containing 4092 bytes elided
│ x             │ /000+0000fe │ 513    │ overflow │ 0     │ 4092    │ 0      │ 0          │ 2093056  │ 4096   │
│ x             │ /000+0000ff │ 514    │ overflow │ 0     │ 4092    │ 0      │ 0          │ 2097152  │ 4096   │
└───────────────┴─────────────┴────────┴──────────┴───────┴─────────┴────────┴────────────┴──────────┴────────┘
sqlite> select 1024*1024 - 256*4092;
┌──────────────────────┐
│ 1024*1024 - 256*4092 │
├──────────────────────┤
│ 1024                 │
└──────────────────────┘
sqlite>

Notice that after the update the cell size in the table has changed, however the overflow still the same number of bytes and the table cell still contains the same first 1024 bytes of the blob.

Why are the overflow pages being re-written if the boundary has not changed (or are they just being re-written because the whole cell is being replaced)?

Note that the same thing happens even if the blob is the last field in the record.

(5) By Keith Medcalf (kmedcalf) on 2020-07-13 09:43:28 in reply to 4 [link] [source]

Note this is probably why it is recommended to store large fields (blob/text) (that consume overflow pages) in a separate table and access them by reference rather than mixing such large items in the same table as small associated data.

For example:

create table Data1A
(
  Data_id integer not null unique references Data1B(Data_ID), 
  Items_id INTEGER, 
  Key TEXT
);
create table Data1B
(
  Data_id integer primary key,
  data blob
);
create view Data1 
         as select Data, 
                   Items_id, 
                   Key 
              from Data1A, Data1B
             where Data1A.Data_id == Data1B.Data_id
;
create trigger Data1Insert instead of insert on Data1
begin
  insert into Data1B (Data) values (new.Data);
  insert into Data1A values (last_insert_rowid(), new.Items_id, new.Key);
end;
create trigger Data1Update instead of update on Data1
begin
  update Data1B set data=new.data
   where Data_id == (select data_id from Data1A where Items_id IS old.items_id and Key IS old.key)
     and new.data is not old.data;
  update Data1A set Items_id=new.items_id, key=new.key
   where Items_id IS old.items_id and Key IS old.key;
end;
create trigger Data1Delete instead of delete on Data1
begin
  delete from Data1B
   where Data_id == (select data_id from Data1A where Items_id IS old.items_id and Key IS old.key);
  delete from Data1A
   where Items_id IS old.items_id and key IS old.key;
end;

Which splits out the blob (on the assumption that it will be large) into its own table and provides a "view" that looks and works like the original table but does not re-write the blob unless you are actually updating it.

(6) By Gunter Hick (gunter_hick) on 2020-07-13 09:51:31 in reply to 1 [link] [source]

This has to do with the way Sqlite stores rows and also (intentionally?) disregarding the design rules regarding BLOB fields.

A row in the Sqlite format is stored as a "manifest" (this is a list of the types of the data stored in the row) and the "payload" (this is the actual data stored in the fields).

To access the value of a field, Sqlite has to parse the manifest to find the data types and skip the appropriate number of bytes in the payload to locate the values. Therefore it is recommended to put BLOBs last in the field list, so that the values of the other fields may be read without accessing the overflow area.

The original record in your example has the following representation:

Manifest:
.) header length
.) BLOB size X
.) INTEGER 1
.) TEXT 5

Payload:
x bytes BLOB
(nothing)
"MyKey"

After your updates, the record looks like:

Manifest:
.) header length
.) BLOB size x
.) NULL
.) NULL

Payload:
x bytes BLOB
(nothing)
(nothing)

Note that some values don't get stored in the Payload, specifically NULL, 0 and 1 are represented directly in the manifest. Setting some fields to NULL makes the representation shorter. And because your BLOB field is at the beginning of the field list, the overflow area has to be acessed and updated to reflect that.

(7) By Keith Medcalf (kmedcalf) on 2020-07-13 11:17:14 in reply to 6 [link] [source]

That appears to be true, however, if you put the blob at the "end" of the record the contents of the overflow pages are still overwritten when the payload length changes but are not overwritten if the payload length stays the same.

So yes, putting "long fields" (or just long records) that have overflow lists means that the entire overflow must be read through and decoded in order to get to the data which appears "after", however, when you re-write a record and the size of the payload changes, the entire cell (including the overflow pages) are all re-written.

(8) By Marc (mbeeson1) on 2020-07-13 20:41:39 in reply to 7 [link] [source]

Yes, with the database for our application, the long fields are the last field in each table and the file size continues to grow when the update command is executed.

From reading the previous replies, it seems we have to change our schema design to store the long fields on a separate table to get around this issue.

Thanks for looking into this and providing a detailed explanation of what is going on behind the scenes.

(9) By Dan Kennedy (dan) on 2020-07-13 21:12:54 in reply to 8 [link] [source]

the file size continues to grow when the update command is executed

It doesn't grow with every UPDATE statement, only the first, correct?

i.e. if you execute one of your UPDATE statements you should end up with X or fewer free pages in db, where X is the number of overflow pages the largest record in the table uses. Executing subsequent updates of the type you mention should not change this - there will still only be X or fewer free pages in the db.

(10) By Marc (mbeeson1) on 2020-07-13 22:17:59 in reply to 9 [link] [source]

Correct. If I update the row and set the non-blob fields to have values, then update again them to null, the file size remains the same.

We actually use NHibernate to manage our database, and when a item gets deleted NHibernate issues a update that nulls the reference columns, then issues a delete command. This is where one of our users noticed their project was growing after deleting very large items. (Some of these items can be in the hundreds of MB).

I narrowed down the growing behavior to the update command which surprised me. If a delete command is only used, the file size remains the same (which is expected).