SQLite Forum

Understanding memory allocation behavior of SQLite
Login
I generated a CSV file containing a bunch of random data using the following Python code:

```
import random

csv = open('x.csv', 'w')
print('col1,col2,col3,col4,col5', file=csv)
r = 0
a = [x for x in range(ord('A'), ord('Z')+1)]
u = 2**62
while r < 77000000:
    r += 1
    random.shuffle(a)
    print(random.randint(1,u), end=',', file=csv)
    print(random.randint(1,u), end=',', file=csv)
    print(random.randint(1,u), end=',', file=csv)
    print(random.randint(1,u), end=',', file=csv)
    print(''.join(map(chr,a)), file=csv)
csv.close()
```

Creating the WITHOUT ROWID table and extra index, and then loading the random data results as follows:

```
create virtual table s using vsv (filename = x.csv, header, affinity=integer);
Run Time: real 0.001 user 0.000000 sys 0.000000
create table x (col1 integer not null, col2 integer not null, col3 integer not null, col4 integer not null, col5 text not null, primary key(col1, col2, col3)) without rowid;
Run Time: real 0.000 user 0.000000 sys 0.000000
create index xa on x (col4, col5);
Run Time: real 0.000 user 0.000000 sys 0.000000
insert into x select * from s;
Run Time: real 512.051 user 503.484375 sys 8.156250

TimeThis :  Elapsed Time :  00:08:51.023
```

Doing the same thing using a standard table (not a WITHOUT ROWID table) looks like this:

```
create virtual table s using vsv (filename = x.csv, header, affinity=integer);
Run Time: real 0.000 user 0.000000 sys 0.000000
create table x (col1 integer not null, col2 integer not null, col3 integer not null, col4 integer not null, col5 text not null, primary key(col1, col2, col3));
Run Time: real 0.000 user 0.000000 sys 0.000000
create index xa on x (col4, col5);
Run Time: real 0.000 user 0.000000 sys 0.000000
insert into x select * from s;
Run Time: real 487.248 user 478.390625 sys 8.484375

TimeThis :  Elapsed Time :  00:08:29.620
```


Loading the same data directly into a standard table (not a WITHOUT ROWID table) and then creating the indexes results as follows:

```
create virtual table s using vsv (filename = x.csv, header, affinity=integer);
Run Time: real 0.001 user 0.000000 sys 0.000000
create table x (col1 integer not null, col2 integer not null, col3 integer not null, col4 integer not null, col5 text not null);
Run Time: real 0.000 user 0.000000 sys 0.000000
insert into x select * from s;
Run Time: real 114.776 user 110.484375 sys 4.156250
create unique index xp on x (col1, col2, col3);
Run Time: real 48.642 user 146.171875 sys 24.843750
create index xa on x (col4, col5);
Run Time: real 47.736 user 150.078125 sys 36.671875

TimeThis :  Elapsed Time :  00:03:34.656
```

Loading the data in sorted order and then creating the non-primary index looks thus for a WITHOUT ROWID table:

```
create virtual table s using vsv (filename = x.csv, header, affinity=integer);
Run Time: real 0.001 user 0.000000 sys 0.000000
create table x (col1 integer not null, col2 integer not null, col3 integer not null, col4 integer not null, col5 text not null, primary key(col1, col2, col3)) without rowid;
Run Time: real 0.001 user 0.000000 sys 0.000000
insert into x select * from s order by col1, col2, col3;
Run Time: real 187.497 user 253.875000 sys 48.234375
create index xa on x (col4, col5);
Run Time: real 52.749 user 150.750000 sys 50.046875

TimeThis :  Elapsed Time :  00:04:03.667
```

Creating the WITHOUT ROWID table and its secondary index, and then loading the data in-order of the primary key has the following result (as expected, since we are inserting in-order to one B-Tree and in random order to the other):

```
create virtual table s using vsv (filename = x.csv, header, affinity=integer);
Run Time: real 0.001 user 0.000000 sys 0.000000
create table x (col1 integer not null, col2 integer not null, col3 integer not null, col4 integer not null, col5 text not null, primary key(col1, col2, col3)) without rowid;
Run Time: real 0.000 user 0.000000 sys 0.000000
create index xa on x (col4, col5);
Run Time: real 0.000 user 0.000000 sys 0.000000
insert into x select * from s order by col1, col2, col3;
Run Time: real 392.314 user 454.437500 sys 51.906250

TimeThis :  Elapsed Time :  00:06:51.696
```

The conclusion is that the most efficient way to load the data is to insert it into a standard (ROWID) table and then create the indexes.  Alternatively, loading the data in sorted order and then creating the additional index (after the data is loaded) also works and is the only alternative for a WITHOUT ROWID table to avoid B-Tree rebalance operations.