SQLite Forum

Timeline
Login

9 forum posts by user srinarasi

2021-10-08
11:51 Post: Understanding memory allocation behavior of SQLite (artifact: ea940f702a user: srinarasi)

Some context:

I have a simple in-memory database with a single table and an index. Here's the schema for the table. CREATE TABLE 'table_name' ('col1' INTEGER, 'col2' INTEGER, 'col3' INTEGER, 'col4' INTEGER, 'col5' STRING, PRIMARY KEY('col1','col2','col3')) WITHOUT ROWID;";

The index is on one of the integers and the string. This community helped me improve the insert performance quite a bit already. But I am still not satisfied with the performance of the memory allocator.

My table has 77 million rows and by my calculation has 2 GB of data. But around 80% of the time is spent on malloc/free. So I provided my own allocator functions and found out that SQLite asked my allocator to allocate 650GB of data in total (and of course also to free most of it) which surprised me. Here's the data of allocation sizes and counts for my table.

I also run sqlite3_db_config(_database, SQLITE_DBCONFIG_LOOKASIDE, 2048, 2048); before I create the table with the hope that it reduces the necessity for smaller allocations. But it doesn't seem to help much.

Using the default allocator on Windows, it takes more than 20 minutes to populate the data where as using mimalloc, the performance is much better.

No queries or any other SQLite operations are happening during the test.

Here are my questions.

  1. Why does SQLite allocate 650 GB for populating a much smaller DB?

  2. Is there any way to reduce it? (Except doing all the memory management myself?)

  3. SQLITE_STATUS_MEMORY_USED indicates that SQLite is using ~6.5 GB of data but my calculations, DB is only 2GB in size. What could explain this discrepancy?

  4. I also see that the current value of SQLITE_STATUS_MEMORY_USED almost doubles after the last row has been written to the DB. Why could that be?

Regards,

Bhargava Srinarasi

2021-08-12
13:43 Reply: Increasing insert performance of an in-memory db (artifact: 3f3416f946 user: srinarasi)

It has to be in a single table. So it's not going to work for us.

10:42 Reply: Increasing insert performance of an in-memory db (artifact: 8168333f4a user: srinarasi)

No reads are happening during the test. Playing around with WAL mode and synchronous. It helped a lot. I can now insert 250,000 rows/sec. Thanks!

10:41 Reply: Increasing insert performance of an in-memory db (artifact: 355303a475 user: srinarasi)

I used WITHOUT ROWID option, WAL journaling mode and synchronous = 0 and played around with SQLITE_DEFAULT_PCACHE_INITSZ and SQLITE_DEFAULT_PAGE_SIZE. I can now insert ~250,000 rows/sec. I wonder if I can push it to 1,000,000 rows/sec :).

07:48 Reply: Increasing insert performance of an in-memory db (artifact: 55d78edc86 user: srinarasi)
  1. It's not a WITHOUT ROWID table. Does it have an impact on insert performance?
  2. The strings are between 0 and 10000 bytes long. On average they are around 70 bytes long. I use SQLITE_STATIC while binding them to my prepared statement.
07:45 Reply: Increasing insert performance of an in-memory db (artifact: 8c8839253b user: srinarasi)

Thanks for this! The inserts were not in primary key order. Changing my code to do inserts in primary key order improved the performance by around 80%. Now memcpy() takes around 15% of the time. malloc() takes around 75%.

I guess the next step is to play around with page_size to see if I can do fewer mallocs? The documentation says that page_size needs to be set before the database is created. But how do I do it in case of in-memory DBs?

Do you have any other ideas about speeding up memory allocation?

2021-08-11
09:25 Reply: Increasing insert performance of an in-memory db (artifact: 60329ae966 user: srinarasi)

That's a good point. I tried it on a real i7 core as well. The profile looks similar. memcpy() is the bottleneck and the insertion rate is ~18,000 rows/sec.

09:24 Reply: Increasing insert performance of an in-memory db (artifact: ef651c17ec user: srinarasi)

I'm already using a single inserter thread. I played around with multiple inserters but that didn't really improve things much (as expected).

My transaction size is 50,000. Since the bottleneck seems to be memcpy(), do you think tweaking transaction size helps much?

My journal mode is OFF and the database is in-memory. Looking at the documentation, I think synchronous matters only for writing to the disk.

2021-08-10
08:29 Post: Increasing insert performance of an in-memory db (artifact: 96f5882ba7 user: srinarasi)

Some context

We have an in-memory database with a simple table(3 integers and a string, the first 3 numbers form the primary key, no other indexes at this point ). We need to create the DB from scratch on launch and populate 100 million rows in the table. The DB grows to ~9GB after it's completely populated. Currently it takes around 90 minutes which means the insertion rate is ~18000 rows/second.

Our code is written in C++. I have tried everything in this thread. My machine is on Azure with 12 cores and has 128 GB of RAM. We compile sqlite with SQLITE_WIN32_MALLOC and SQLITE_WIN32_HEAP_CREATE.

On profiling:

I see that around 75% of the time is spent in memcpy_repmovs(). 97% of the time is spent in sqlite3BtreeMovetoUnpacked().

Questions

  1. Why does memcpy() hog so much of the CPU when we're only storing around ~1.5MB/sec? Is this normal?
  2. For storing around 100 million rows of around 90 byes each, what do you expect the best case insertion rate to be? I'm trying to set my own expectations of what we can achieve with our current approach.

Thanks,

Bhargava