Fast way to insert rows in SQLite

(1.1) By Avinash (avinash) on 2021-05-11 08:17:50 edited from 1.0 [link] [source]

Hey everyone! I am very new to SQLite and I am doing a little experiment where I am trying to figure out really fast way to insert rows in SQLite. I am fine with db corruptions, issues due to process crash / OS.

I have been able to insert 100M rows in 33 seconds on my local machine. I was wondering if there are any further optimisations / tricks I can do.

I create a db with one table which has 3 columns. Then I randomly generate these 3 values and insert them. I found this excellent Stack Overflow post and trying it out.

Here's what I have done so far:

PRAGMA settings:

PRAGMA journal_mode = OFF;
PRAGMA synchronous = 0;
PRAGMA cache_size = 1000000;
PRAGMA locking_mode = EXCLUSIVE;
PRAGMA temp_store = MEMORY;",

Wrap all insertions in a transaction
Use a prepared statement
Insert in batches: First I inserted each row one after another, it took about 60s. Then I did a batch insert (of 100 rows) with prepared statement it came down to 33 seconds.

If you would like to check my code:

the rust version, does 100M inserts in 33ish seconds.
almost similar python version, does 100M inserts in 150ish seconds using PyPy.

What else I can do to speed this even further?

Any PRAGMA settings I can tweak further?
Does doing insertions multithreaded mode can help? I am really not sure what should be my journal_mode and other settings should be if I want to use multithreading.

Appreciate any suggestions. Thank you =)

My machine details: mac OS Big Sur, 2.4Ghz Quad Core Intel Core i5, 8GB RAM, 120GB SSD

I run using gnu time for measuring it:

/usr/bin/time bench_binary
33.98 real
30.40 user
2.58 sys

(2.1) By ddevienne on 2021-05-11 08:48:38 edited from 2.0 in reply to 1.1 [link] [source]

Beside playing with the page_size, there isn't much that can be improved.

The default is 4KB pages these days, and you might gain a little with 8KB
and 16KB at insertion time, at the expense of more IO later on updates and
deletes, so there's a balance to find specific to what you are doing.

Note that you cache_size is affected by the page_size, unless you switch to
a negative value, for an absolute value in KBs. See pragma_cache_size

Multi-threading won't help at all on the SQLite side. But you can of course
prepare all your batches in a separate thread, and use an SPSC queue to pass
those batches around to the SQLite thread, to at least get some concurrency
between the SQLite code, and your own code.

(But you probably want your queue to be fixed-size and blocking when full,
to throttle the producer-side, which is going to be (much?) faster than the
SQLite consumer side actually doing the IO)

Maybe others will have further suggestions. --DD

(3) By Richard Hipp (drh) on 2021-05-11 10:24:00 in reply to 1.1 [link] [source]

What is the schema of the table you are inserting into? If you are able to omit UNIQUE constraints and PRIMARY KEYs (other than INTEGER PRIMARY KEY), and use "CREATE UNIQUE INDEX" statements that happen after the insert in place of UNIQUE constraints, that will probably work faster.

(4) By ddevienne on 2021-05-11 10:52:40 in reply to 3 [link] [source]

Here's the DDL from the OP's Rust code linked above:

        "CREATE TABLE IF NOT EXISTS user (
                id INTEGER not null primary key,
                area CHAR(6),
                age INTEGER not null,
                active INTEGER not null)",

(5) By David Raymond (dvdraymond) on 2021-05-11 12:58:59 in reply to 1.1 [link] [source]

Not sure that it would help at all for SQLite, but one example from remote DBMS inserts is using multiple records in your values statement.

So instead of...

INSERT INTO user VALUES (NULL,?,?,?);

...binding 3 values for 1 record per INSERT statement

You might try doing 10 at a time:

INSERT INTO user VALUES (NULL,?,?,?), (NULL,?,?,?), (NULL,?,?,?), (NULL,?,?,?)... x10

...binding 30 values for 10 records per INSERT statement

(6.2) By Avinash (avinash) on 2021-05-11 13:47:39 edited from 6.1 in reply to 5 [link] [source]

Hey, it does help in the case of SQLite. Here is what I did,

Without batching:

I called INSERT INTO user VALUES (NULL,?,?,?); for 100M and the whole thing took 60s:
```
59.64 real
57.01 user
2.27 sys
```
The code is here
With batching:

I batched, did inserts of 50 rows at once, INSERT INTO user VALUES (NULL,?,?,?) (NULL,?,?,?) x50 in loop of 100M/50, it took 34s:
```
34.04 real
31.66 user
2.20 sys
```
The code is here

I also tried increasing the batch size to 100, it was almost same and at 500 it took more time.

(7) By Avinash (avinash) on 2021-05-11 13:09:51 in reply to 4 [link] [source]

Thanks for posting here. And yes, I confirm that I am not creating any indexes other than the primary key one

(8.1) By Simon Slavin (slavin) on 2021-05-11 13:22:16 edited from 8.0 in reply to 1.1 [link] [source]

https://docs.python.org/3/library/sqlite3.html

The Python sqlite3 module by default issues a BEGIN statement
implicitly before a Data Modification Language (DML) statement
(i.e. INSERT/UPDATE/DELETE/REPLACE).
[...]
Changed in version 3.6: sqlite3 used to implicitly commit an
open transaction before DDL statements. This is no longer the
case.

Given the above, I'm not sure whether you are correctly controlling your transactions. I would be explicitly issue both BEGIN and COMMIT commands. But I don't know enough about Python to understand the implications.

Not at all sure that this is your problem, but it might help you figure out why Python is so much slower than Rust.

(9) By anonymous on 2021-05-11 13:25:10 in reply to 1.1 [link] [source]

This isn't an answer to your question, but if the data is already in some other container exposing it through a virtual table can eliminate the need to do an insert.

(10) By Avinash (avinash) on 2021-05-11 13:26:42 in reply to 2.1 [link] [source]

The default is 4KB pages these days, and you might gain a little with 8KB and 16KB at insertion time, at the expense of more IO later on updates and deletes, so there's a balance to find specific to what you are doing.

I tried increasing it to 8KB, I didn't see much difference. Let me try again with 8KB and 16KB and report here.

Multi-threading won't help at all on the SQLite side. But you can of course prepare all your batches in a separate thread, and use an SPSC queue to pass those batches around to the SQLite thread, to at least get some concurrency between the SQLite code, and your own code.

I tried this, but went with MPSC. The numbers are almost similar as non-threaded:

   34.53 real
   45.11 user
   4.47 sys

(It could be very much possible that I am doing something wrong, I am new to Rust as well. Linked tthe code at the end)

(But you probably want your queue to be fixed-size and blocking when full, to throttle the producer-side, which is going to be (much?) faster than the SQLite consumer side actually doing the IO)

I used the unbounded channel, so producer won't block on it. Can you tell me what difference it would make?

here is my code.

(11) By Avinash (avinash) on 2021-05-11 13:31:09 in reply to 9 [link] [source]

Unfortunately the data doesn't exist. However, this does give me a reason to experiment more using virtual tables. I shall do that later :D

(12) By David Raymond (dvdraymond) on 2021-05-11 13:33:11 in reply to 8.1 [link] [source]

For the Python library, adding in "isolation_level = None" in the connection call gets rid of all the automatic transaction handling and leaves it in autocommit mode/full manual transaction control.

(13) By Stephan Beal (stephan) on 2021-05-11 13:33:58 in reply to 10 [link] [source]

I tried increasing it to 8KB, I didn't see much difference. Let me try again with 8KB and 16KB and report here.

Writing it out to an in-memory db, provided you have enough RAM, will tell you your theoretical maximum speed - writing it to persistent storage faster than that won't be possible. You could use such a test to estimate the approximate performance cost/overhead of the storage layer.

(14) By ddevienne on 2021-05-11 14:07:38 in reply to 10 [link] [source]

16KB was faster for me, but your row data is tiny, mine was bigger (strings and blobs, up to 1MB per row).

If you measured the time the producing side takes,
versus the consuming (SQLite) side, you'll see the majority is on the SQLite side (> 99% I suspect).

So creating many producers (1-per CPU) to parallelize the 1% or less of the total, is not going to help much. A single producer is enough.

Plus having many producers increases contention too.

W/o using a fixed-size queue, and having a producer much faster than the consumer side, you are actually accumulating too much memory in the queue, increasing allocation, etc...
A fixed-size queue blocks the producer when the queue is full, and wakes it up only after the consumer processed one-or-more items.

Measure peak RAM, in addition to wall time. With a fixed-size queue, the peak RAM should be tiny.
While w/o one, I expect the peak RAM to be much larger, close to the full data side in your case, the producing side being so much faster.

In a real world scenario, there's not such an imbalance between what SQLite must do, and what the rest of the code must do,
so the SPSC approach on two threads works fine, up to a maximum of 2x faster in the perfect case.
But it can't be lower than max(producer, consumer) so if your consumer is already at 99%, at most you save 1%...

At over 3M rows per seconds, you're already very fast. Can't get any faster IMHO.

Stephan's suggestion to use an in-memory DB is also a good one.
Will give you the maximum throughput w/o the vagaries of the filesystem.

(15) By Avinash (avinash) on 2021-05-11 14:58:32 in reply to 1.1 [link] [source]

I ran another experiment, to see if there would be any perf boost if I generate the pk instead of SQLite.

When I let SQLite generate PK, it took 60s:
```
59.99 real
56.83 user
2.33 sys
```
code here
When I generated it in the code, it took 75s, almost 15s extra!:
```
76.12 real
71.95 user
2.71 sys
```
code here

Since SQLite makes a function call I thought providing pk may be turn out faster. How wrong I was!

(16) By Bill Wade (billwade) on 2021-05-11 15:06:51 in reply to 1.1 [link] [source]

"CREATE TABLE IF NOT EXISTS user ( id INTEGER not null primary key, area CHAR(6), age INTEGER not null, active INTEGER not null)"

Depending on the id value, its storage might be around 7 bytes.

Assuming ages are in 0 to 127, and active values are 0 or 1 the SQLite storage for those is two or three bytes (1 byte per column "overhead", and another byte holds age when it is more than 1)

You are probably looking at around 17 bytes per row if length(area) is typically 6 bytes (look at your database disk size after you've inserted a million records to see if I'm right).

You could probably save 2 bytes per row by using a BLOB to hold all of the non-key information. The blob could hold age and active in a single byte, and its remaining bytes would hold area.

CREATE TABLE IF NOT EXISTS user (id INTEGER PRIMARY KEY, info BLOB);

You could create a VIEW that extracts value in your preferred multi-column form (and the VIEW could have triggers that support inserts and updates in the multi-column form, but those triggers would probably not be as fast as building the BLOB in your application).

Saving 2 bytes out of 17 might give you a 10% or more speed boost.

(17) By Avinash (avinash) on 2021-05-11 15:13:39 in reply to 13 [link] [source]

This is such a great and clever idea! How come I didn't think of this!

So, I took my best performing code, just updated the file path to :memory:. It took almost same time:

33.08 real
31.58 user
0.72 sys

So I guess this is the best I could do.

(18) By skywalk on 2021-05-12 12:55:26 in reply to 3 [link] [source]

How does WITHOUT ROWID affect performance?

(19) By Ryan Smith (cuz) on 2021-05-12 13:06:44 in reply to 18 [source]

WITHOUT ROWID simply uses another typically heavier Index as the primary key in stead of the rowid. Unless this index is also an INTEGER PK it is guaranteed to be slower, and if it is an Integer PK, it should be close to the same speed as the row id, but possibly still marginally slower.

The WITHOUT ROWID optimization is really a space-saver where another PK must be present anyway. It should not be any faster[1].

Put it like this: If that was faster, the devs should immediately implement that as the new row_id mechanism!

[1] Note: If the without rowid optimization replaces a previous duo of row_id + PK to now only have the PK, then it is bound to be faster, but simply replacing a rowid with a previously non-existing PK, can never be faster.

(20) By Avinash (avinash) on 2021-05-13 12:05:55 in reply to 14 [link] [source]

Thank you! Your message includes lots of important learnings for me :)

(21) By Avinash (avinash) on 2021-05-13 12:06:43 in reply to 16 [link] [source]

Actually, I would need the columns so that I can query them later. But I will experiment with BLOB as well

(22) By Keith Medcalf (kmedcalf) on 2021-05-13 13:35:29 in reply to 15 [link] [source]

Of course it takes longer if you specify the rowid.

When the rowid is automatically generated it is simply the one greater than the last one present, which is obtained simply by going to the tail of the b-tree (following all the right-hand pointers to the end of the line) and adding one.

If you specify the rowid, then your provided rowid must be looked up in a separate seek operation to decide what to do (succeed or have an abortion).

An extra lookup in the sqlite_sequence table is only required if you used the AUTOINCREMENT keyword, which will make the process even longer.

(23) By anonymous on 2021-11-02 10:43:40 in reply to 1.1 [link] [source]

If you create a module and use a virtual table that just link to your data in memory, then copy this data to your table, you can drastically improve insert speed.

(24.9) By cj (sqlitening) on 2022-03-14 23:18:17 edited from 24.8 in reply to 1.1 [link] [source]

I7-2600K, 32GB ram, Windows 10 32-bit, 512GB Samsung 860 SSD.
Can't come close to 3.3 million per second!
Get only 400,000 per second (which I thought was fast.)

FUNCTION PBMAIN AS LONG
 LOCAL s AS STRING
 LOCAL starttime,endtime AS STRING
 LOCAL x,outerloop AS LONG
 KILL   "test.db3"
 slOpen "test.db3","C"

 slexe "PRAGMA journal_mode = OFF"
 slexe "PRAGMA synchronous = 0"
 slexe "PRAGMA cache_size = 1000000"
 slexe "PRAGMA locking_mode = EXCLUSIVE"
 slexe "PRAGMA temp_store = MEMORY"

 s =  "CREATE TABLE IF NOT EXISTS user (
 s += "  id INTEGER not null primary key,
 s += "  area CHAR(6),"
 s += "  age INTEGER not null,"
 s += "  active INTEGER not null)"
 slexe s
 REDIM s(1 TO 3) AS STRING
 REDIM srecord(1 TO 400000) AS STRING
 s(1) = slBuildBindDat("aaaaaa")
 s(2) = slBuildBindDat(STR$(x),"T")
 s(3) = slBuildbindDat(STR$(x),"T")
 s = JOIN$(s(),"")
 FOR x = 1 TO UBOUND(srecord)
  sRecord(x) = s
 NEXT
 s = JOIN$(sRecord(),"")
 starttime = TIME$
 FOR outerloop = 1 TO 1
  slexe "begin exclusive"
  slexebind "insert into user values(null,?,?,?)",s,"V3"
  slexe "end"
 NEXT
 endtime = TIME$
END FUNCTION

(25) By ddevienne on 2022-03-15 08:38:58 in reply to 24.9 [link] [source]

Can't come close to 3.3 million per second! Get only 400,000 per second (which I thought was fast.)

You have to also think in terms of throughput, not just rows/sec.

You can have inflated rows/sec numbers with tiny rows. The OP's is in the 16-bytes range, which is quite small.

Mines is more in the 100-bytes range. And there, I reach 800K rows/sec.
You cannot scale linearly of course, but that's still around 6x more content.
That's of course made-up data, unrealistic data, so not really a real-world use-case.
But in the same ballpark as the OP IMHO.