Understanding memory allocation behavior of SQLite

(1) By srinarasi on 2021-10-08 11:51:10 [link] [source]

Some context:

I have a simple in-memory database with a single table and an index. Here's the schema for the table. CREATE TABLE 'table_name' ('col1' INTEGER, 'col2' INTEGER, 'col3' INTEGER, 'col4' INTEGER, 'col5' STRING, PRIMARY KEY('col1','col2','col3')) WITHOUT ROWID;";

The index is on one of the integers and the string. This community helped me improve the insert performance quite a bit already. But I am still not satisfied with the performance of the memory allocator.

My table has 77 million rows and by my calculation has 2 GB of data. But around 80% of the time is spent on malloc/free. So I provided my own allocator functions and found out that SQLite asked my allocator to allocate 650GB of data in total (and of course also to free most of it) which surprised me. Here's the data of allocation sizes and counts for my table.

I also run sqlite3_db_config(_database, SQLITE_DBCONFIG_LOOKASIDE, 2048, 2048); before I create the table with the hope that it reduces the necessity for smaller allocations. But it doesn't seem to help much.

Using the default allocator on Windows, it takes more than 20 minutes to populate the data where as using mimalloc, the performance is much better.

No queries or any other SQLite operations are happening during the test.

Here are my questions.

Why does SQLite allocate 650 GB for populating a much smaller DB?
Is there any way to reduce it? (Except doing all the memory management myself?)
SQLITE_STATUS_MEMORY_USED indicates that SQLite is using ~6.5 GB of data but my calculations, DB is only 2GB in size. What could explain this discrepancy?
I also see that the current value of SQLITE_STATUS_MEMORY_USED almost doubles after the last row has been written to the DB. Why could that be?

Regards,

Bhargava Srinarasi

(2) By Simon Slavin (slavin) on 2021-10-09 15:16:38 in reply to 1 updated by 2.1 [link] [source]

Out of interest, can you leave your memory monitoring hooks in place but test this ?

1) Create the table, allowing SQLite to assign ROWIDs, and without defining any indexes.
2) Load your 77 million rows of data, using transactions to bunch together INSERTs as before
3) Create a UNIQUE INDEX on <code>('col1','col2','col3')</code>

Does this turn out faster or slower than the procedure you describe above ?  Does it allocate more or less memory ?

(2.1) By Simon Slavin (slavin) on 2021-10-10 02:56:45 edited from 2.0 in reply to 1 [link] [source]

Out of interest, can you leave your memory monitoring hooks in place but test this ?

Create the table, allowing SQLite to assign ROWIDs, and without the PRIMARY KEY.
Load your 77 million rows of data, using transactions to bunch together INSERTs as before
Create a UNIQUE INDEX on ('col1','col2','col3')

Does this turn out faster or slower than the procedure you describe above ? Does it allocate more or less memory ?

(3) By anonymous on 2021-10-09 20:13:43 in reply to 1 [link] [source]

SQLite's in memory db uses the same page layout that a disk based db would, it just never writes to the disk, thus if your 77M inserts are done separately, or even in batches but they involve wildly spaced primary keys then each row will probably have to allocate a full page, thus 4KB per inserted row, Multiply that by 77M and you get 300+ GB of allocated memory, now index inserts will also require page sized memory allocations, so you should double that to 600+ GB, roughly what you are seeing.

(4) By anonymous on 2021-10-09 20:18:26 in reply to 1 [source]

Same anonymous here!

Forgot to say, one way to quickly test this (and reduce allocations a lot as well) is to try using a smaller page size, unless your records are really fat may be you should try 1KB or even 512B

(5) By Keith Medcalf (kmedcalf) on 2021-10-09 22:36:09 in reply to 1 [link] [source]

In the following table definition taken from your post (with all your bad quotes fixed):

CREATE TABLE table_name 
(
   col1 INTEGER, 
   col2 INTEGER, 
   col3 INTEGER, 
   col4 INTEGER, 
   col5 STRING, 
   PRIMARY KEY(col1, col2, col3)
) WITHOUT ROWID;

Firstly, the index is on (col1, col2, col3). You would need a CREATE INDEX statement to create an additional index.

Secondly, STRING is not a known datatype. It is an alias for NUMERIC. If you actually intend to store "UNICODE TEXT" in this column then it may be perspicacious to declare the type as TEXT in order to avoid forcing the exercize of the conversion path (and the multiple malloc/free calls associated therewith) which will never be applicable (and cause a massive usage of CPU and memory for no useful purpose).

Thirdly, are you loading the table in-order?

(6) By Keith Medcalf (kmedcalf) on 2021-10-09 22:53:22 in reply to 1 updated by 6.1 [link] [source]

You will also not that if your chema is thus:

```
CREATE TABLE table_name 
(
   col1 INTEGER, 
   col2 INTEGER, 
   col3 INTEGER, 
   col4 INTEGER, 
   col5 STRING, 
   PRIMARY KEY(col1, col2, col3)
) WITHOUT ROWID
;
CREATE INDEX index_name ON table_name (col4, col5)
;
```

then in addition to all the fiddle-faddle trying to "convert" `col5` from however you present it into a REAL or INTEGER (since you have indicated a preference for storage as an INTEGER, then if that does not work, as REAL, and if that does not work, then "as presented") you are also storing the contents of EVERY ROW TWICE.

Once in the WITHOUT ROWID B-Tree and then yet again in the index_name B-Tree.  The only difference between the two B-Tree's being the `key` and the `payload`, but all records are duplicated completely.

Since it is not possible to insert in-order, then no matter what you do you will be requiring TWO B-Tree rebalance operations FOR EACH ROW (batch) inserted.

Perhaps you need to re-visit your "design".

(6.1) By Keith Medcalf (kmedcalf) on 2021-10-09 22:55:23 edited from 6.0 in reply to 1 [link] [source]

You will also note that if your schema is thus:

CREATE TABLE table_name 
(
   col1 INTEGER, 
   col2 INTEGER, 
   col3 INTEGER, 
   col4 INTEGER, 
   col5 STRING, 
   PRIMARY KEY(col1, col2, col3)
) WITHOUT ROWID
;
CREATE INDEX index_name ON table_name (col4, col5)
;

then in addition to all the fiddle-faddle trying to "convert" col5 from however you present it into a REAL or INTEGER (since you have indicated a preference for storage as an INTEGER, then if that does not work, as REAL, and if that does not work, then "as presented") you are also storing the contents of EVERY ROW TWICE.

Once in the WITHOUT ROWID B-Tree and then yet again in the index_name B-Tree. The only difference between the two B-Tree's being the key and the payload (all columns are required in both B-Tree's but the ordering is different), but all records are duplicated completely.

Since it is not possible to insert in-order, then no matter what you do you will be requiring TWO B-Tree rebalance operations FOR EACH ROW (batch) inserted.

Perhaps you need to re-visit your "design".

(7) By Keith Medcalf (kmedcalf) on 2021-10-10 03:36:18 in reply to 1 [link] [source]

I generated a CSV file containing a bunch of random data using the following Python code:

import random

csv = open('x.csv', 'w')
print('col1,col2,col3,col4,col5', file=csv)
r = 0
a = [x for x in range(ord('A'), ord('Z')+1)]
u = 2**62
while r < 77000000:
    r += 1
    random.shuffle(a)
    print(random.randint(1,u), end=',', file=csv)
    print(random.randint(1,u), end=',', file=csv)
    print(random.randint(1,u), end=',', file=csv)
    print(random.randint(1,u), end=',', file=csv)
    print(''.join(map(chr,a)), file=csv)
csv.close()

Creating the WITHOUT ROWID table and extra index, and then loading the random data results as follows:

create virtual table s using vsv (filename = x.csv, header, affinity=integer);
Run Time: real 0.001 user 0.000000 sys 0.000000
create table x (col1 integer not null, col2 integer not null, col3 integer not null, col4 integer not null, col5 text not null, primary key(col1, col2, col3)) without rowid;
Run Time: real 0.000 user 0.000000 sys 0.000000
create index xa on x (col4, col5);
Run Time: real 0.000 user 0.000000 sys 0.000000
insert into x select * from s;
Run Time: real 512.051 user 503.484375 sys 8.156250

TimeThis :  Elapsed Time :  00:08:51.023

Doing the same thing using a standard table (not a WITHOUT ROWID table) looks like this:

create virtual table s using vsv (filename = x.csv, header, affinity=integer);
Run Time: real 0.000 user 0.000000 sys 0.000000
create table x (col1 integer not null, col2 integer not null, col3 integer not null, col4 integer not null, col5 text not null, primary key(col1, col2, col3));
Run Time: real 0.000 user 0.000000 sys 0.000000
create index xa on x (col4, col5);
Run Time: real 0.000 user 0.000000 sys 0.000000
insert into x select * from s;
Run Time: real 487.248 user 478.390625 sys 8.484375

TimeThis :  Elapsed Time :  00:08:29.620

Loading the same data directly into a standard table (not a WITHOUT ROWID table) and then creating the indexes results as follows:

create virtual table s using vsv (filename = x.csv, header, affinity=integer);
Run Time: real 0.001 user 0.000000 sys 0.000000
create table x (col1 integer not null, col2 integer not null, col3 integer not null, col4 integer not null, col5 text not null);
Run Time: real 0.000 user 0.000000 sys 0.000000
insert into x select * from s;
Run Time: real 114.776 user 110.484375 sys 4.156250
create unique index xp on x (col1, col2, col3);
Run Time: real 48.642 user 146.171875 sys 24.843750
create index xa on x (col4, col5);
Run Time: real 47.736 user 150.078125 sys 36.671875

TimeThis :  Elapsed Time :  00:03:34.656

Loading the data in sorted order and then creating the non-primary index looks thus for a WITHOUT ROWID table:

create virtual table s using vsv (filename = x.csv, header, affinity=integer);
Run Time: real 0.001 user 0.000000 sys 0.000000
create table x (col1 integer not null, col2 integer not null, col3 integer not null, col4 integer not null, col5 text not null, primary key(col1, col2, col3)) without rowid;
Run Time: real 0.001 user 0.000000 sys 0.000000
insert into x select * from s order by col1, col2, col3;
Run Time: real 187.497 user 253.875000 sys 48.234375
create index xa on x (col4, col5);
Run Time: real 52.749 user 150.750000 sys 50.046875

TimeThis :  Elapsed Time :  00:04:03.667

Creating the WITHOUT ROWID table and its secondary index, and then loading the data in-order of the primary key has the following result (as expected, since we are inserting in-order to one B-Tree and in random order to the other):

create virtual table s using vsv (filename = x.csv, header, affinity=integer);
Run Time: real 0.001 user 0.000000 sys 0.000000
create table x (col1 integer not null, col2 integer not null, col3 integer not null, col4 integer not null, col5 text not null, primary key(col1, col2, col3)) without rowid;
Run Time: real 0.000 user 0.000000 sys 0.000000
create index xa on x (col4, col5);
Run Time: real 0.000 user 0.000000 sys 0.000000
insert into x select * from s order by col1, col2, col3;
Run Time: real 392.314 user 454.437500 sys 51.906250

TimeThis :  Elapsed Time :  00:06:51.696

The conclusion is that the most efficient way to load the data is to insert it into a standard (ROWID) table and then create the indexes. Alternatively, loading the data in sorted order and then creating the additional index (after the data is loaded) also works and is the only alternative for a WITHOUT ROWID table to avoid B-Tree rebalance operations.

(8) By Keith Medcalf (kmedcalf) on 2021-10-10 03:46:12 in reply to 7 [link] [source]

Note that this does not answer your question(s) directly. However, minimizing the amount of "work" being done also reduces the number of calls to malloc/free, and addressing the "root cause" will be more efficient in the long run (that is, if it takes you 20 minutes to shovel the snow out of the driveway, then having the shovel "empty faster" is not likely to make much difference. However, moving from Edmonton to Miami will speed up the process.)

(9) By Larry Brasfield (larrybr) on 2021-10-10 03:48:02 in reply to 7 [link] [source]

Without gainsaying anything earlier, I wonder about this:

Alternatively, loading the data in sorted order and then creating the additional index (after the data is loaded) also works and is the only alternative for a WITHOUT ROWID table to avoid B-Tree rebalance operations.

I would think that the insert order would need to be close enough to or match a breadth-first traversal of the ultimate (as) balanced (as it's going to get) B-tree. Otherwise, lots of rebalancing will be needed because, ultimately the ordering will match a depth-first traversal but insertion in that order would necessarily create imbalances. (I'll be happy to be educated on this if I've got it wrong.)

(10) By Keith Medcalf (kmedcalf) on 2021-10-10 03:58:43 in reply to 9 [link] [source]

in-order loading will not eliminate splits, only rebalancing.

See https://en.wikipedia.org/wiki/B%2B_tree particularly the section on bulk-loading.

Insertion in sorted (index B-Tree) order has been used since the 1940's to "speed up" table load operations (it is "old technology").

(11) By Keith Medcalf (kmedcalf) on 2021-10-10 04:02:35 in reply to 10 [link] [source]

You will also note that the default behaviour of CREATE INDEX is to build the index by in-order insertion. That is why the process

create table
insert data
create indexes

is quicker than

create table
create indexes
insert data

(12) By Keith Medcalf (kmedcalf) on 2021-10-10 04:31:04 in reply to 10 [link] [source]

Actually, that would be the 1960's or 1970's, when VSAM KSDS arrived. Prior to that ISAM used fixed index (internal) nodes.

VSAM KSDS later became known as "a B-Tree" (Of B- B+ B* B*+ varieties).