SQLite Forum

Faster way to insert into WITHOUT ROWID table?
Login

Faster way to insert into WITHOUT ROWID table?

(1) By Matthijs Tijink (mtijink) on 2021-11-25 10:02:46 [link] [source]

Hi all,

I have a table like this:

CREATE TABLE data (a INTEGER NOT NULL, b INTEGER NOT NULL, c INTEGER, d INTEGER, PRIMARY KEY (a, b)) WITHOUT ROWID;

I want to fill this table as fast as possible with ~20 million rows, so I do some of the usual stuff (one big transaction, prepared statements). Inserting in order also helps a bit. This takes ~19-23 seconds on my machine.

However, this is still roughly 3-4x slower than inserting in a table without primary key and with rowid, which takes ~5-6 seconds. In both cases SQLite is bottlenecked on CPU, not on disk.

I tried a couple of things already, but they didn't help much:

  1. Using an in-memory database. Helps a bit, but still slower than without the primary key on-disk.
  2. Not using a journal. Didn't do a lot, presumably because the table was empty initially.
  3. Inserting all data in an identical in-memory temporary "data2" table, and using INSERT INTO SELECT FROM. Ideally, SQLite should see that the data is already sorted and unique and skip some of the work. Intended to simulate what would happen if I'd create a virtual table. Roughly as fast as inserting directly into the "data" table.
  4. Create and fill a normal table and create an index afterwards. Not ideal, because it creates a larger database and is slower when reading. And it turns out that it's slower than the original approach.

Are there approaches I'm missing? Things I can somehow tell to SQLite to make this faster?

Or is there a way to improve SQLite for this usecase? Ideally approach 3 (with a virtual table) can skip a lot of the insertion work.

(2) By ddevienne on 2021-11-25 12:04:56 in reply to 1 [link] [source]

You haven't said how that DB / table is used later. But If your use case is really as described,
i.e. only fixed-size columns, then you can have random access into your data, by direct indexing,
whether it on-disk or in-memory, if you implement your own virtual table (and indexing) on top of
that regular / tabular data.

Might end up bigger that the SQLite DB, which uses varints, if you intergers are small.
But the extra space is worth it, for direct random-access addressing, performance-wise.

A long time ago, I measured vtables over C++ std::unordered_map-like containers of structs,
being 5x faster than equivalent regular in-memory SQLite tables. And that was even w/o direct
addressing like in your use-case (node-based hash map, so each row is a separate alloc / address).

Short of the above, you've covered all the bases, from what you wrote. FWIW. --DD

(3) By Matthijs Tijink (mtijink) on 2021-11-25 15:51:29 in reply to 2 [link] [source]

Thank you for your reply, good to know.

It's intended for data storage and transfer to other systems, for which SQLite is excellent. The other side might read the data into some in-memory structure of course, but I do need to store the data in regular tables first.

A virtual table approach would work for generating the data (although it's compute-heavy, not completely random access). However, my experiment above seems to show that it's not faster with INSERT INTO SELECT FROM virtual-table than by just inserting directly.

(4) By Keith Medcalf (kmedcalf) on 2021-11-25 17:58:18 in reply to 1 [source]

Havew you increased the cache size?

(5) By Matthijs Tijink (mtijink) on 2021-11-26 09:17:08 in reply to 4 [link] [source]

Yes, but that does not make a difference. I'm not suprised, since SQLite is CPU bound in this case.

(6) By Richard Hipp (drh) on 2021-11-26 11:26:40 in reply to 5 [link] [source]

How are you executing the SQL that does the inserting? Are you preparing a new SQL statement for each row? Or do you prepare a single INSERT statement using parameters, then bind data to the parameters and rerun that one statement for each row?