SQLite PRAGMA configurations not causing noticeable performance increase compared to default PRAGMA configurations

(1.1) By Swastik (961679) on 2022-07-30 17:19:21 edited from 1.0 [link] [source]

I've SQLite DB of size ~130MB setup on Amazon EC2 t4g.medium (2CPU, 4GB RAM, 500GB SSD) CentOS.

I want to use SQLite as cache above MariaDb, so I'm running some tests to check the performance.

Based on the discussions and suggestion from 13th thread of this post https://sqlite.org/forum/forumpost/18426c64b3,

I've configured my SQLite with following PRAGMA values

    journal_mode - WAL
    locking_mode - EXCLUSIVE
    cache_size - 128000 (~500MB, larger than my DB file size so that all pages are in cache)
    temp_store - MEMORY
    synchronous - OFF
    soft_heap_limit - 0
    cell_size_check - 0
    ignore_check_constraints - 1
    threads - 8
    shared_cache - true


When I tested the performance of SQLite with both default PRAGMA values and with above PRAGMA values there was no noticeable performance increase compared to default PRAGMA values

What would be the reason for this? 

Especially large cache_size not helping in making queries faster even after bringing all database pages to cache(I'm doing SELECT * on all tables to bring all database pages to cache)

One reason I thought of is: Since SQLite DB is on SSD time taken to read pages from disk(SSD) and time taken to read pages from memory(cache) is almost same - I may be completely wrong here!


Note: I've used java xerial-jdbc-driver to connect SQLite into my JAVA App

Thanks

(2) By anonymous on 2022-07-30 19:29:09 in reply to 1.1 [link] [source]

I am not sure why are you setting shared cache and exclusive access at the same time, I recommend removing the shared cache altogether as it will add unneeded overhead in your case.

That aside, are you using some form of ORM in your Java application? It could be that object creation in the ORM is dominating the test time.

(3) By Simon Slavin (slavin) on 2022-07-31 00:44:24 in reply to 1.1 [link] [source]

A couple of issues:

We don't know what the default PRAGMA settings are for your installtion (they can be set using compilation options). So we don't know which settings are different from the ones you've been testing against. You can use PRAGMA commands to find out what the current settings are, and make a table showing what the default settings are vs. what your new settings are.

Setting a huge cache might look like a good idea, but we don't know how your OS is handling a process trying to grab so much memory. A normal strategy for an OS is to just page that cache out to virtual memory, which defeats the point of the cache. You will need to use your OS's utilities to see what it's doing when your program is using a big chunk of the cache you set.

(4) By Swastik (961679) on 2022-07-31 04:46:48 in reply to 2 [link] [source]

Ohh my bad, I was testing by adding & commenting different PRAGMA values. Forgot to remove LOCKING_MODE when copy pasted here, we can completely ignore that LOCKING_MODE.

Coming to share cache, the reason I used shared cache is because multiple connections can have access to same cache which was created by first database connection on App start-up(first connect will perform SELECT * on all tables to bring all pages to cache). Please correct me if I'm wrong here

And for ORM, I'm using hibernate but this is not causing any issues because I've also tested without hibernate and using JDBC only. But saw the same results again

(5.1) By Swastik (961679) on 2022-07-31 17:01:57 edited from 5.0 in reply to 3 [link] [source]

1. Here are the default and my values.


PRAGMA                        DEAFULT            MY_CONFIG

journal_mode                   memory              wal
cache_size                   -2000(~1.95kb)      128000(~500MB)
temp_store                    0 (file)            2(MEMORY)
synchronous                   FULL                OFF
soft_heap_limit               0                   0
cell_size_check               0                   0
ignore_check_constraints      0                   1
threads                       0                   8

I need to check how OS is handling, thank you.

Suppose if OS is keeping page in Virtual Memory instead of Main Memory as you said, Is there any way in SQLite to tell OS to keep all cache pages in Main Memory?

(6) By anonymous on 2022-07-31 11:54:32 in reply to 4 [link] [source]

If I remember correctly, Dan mentioned before on this forum that shared cache was introduced for severely memory constrained environments and not recommended for general use. I am sure it makes memory access slower but not sure if that also applies to read only scenarios.

From my own testing, the config that worked best for multiple reader connections was mmap. It always provided the best performance for me. Ymmv as my environment had writers along with the readers.

Also note that a single write will invalidate all the cache that you have accumulated in memory when you are relying on sqlite's page cache, not so with mmap. I am not sure how that works in shared cache mode though, it could be that it is not invalidating, unless of course a change is happening from another none shared cache connection (outside of the process perhaps?)

As for the ORM, I just bring that up because I have been bitten by something similar before, just ensure that whatever you are doing in Java land is not significant enough to influence the outcome of your benchmarking.

If I want to understand what works best on my data, I would try that first in a neutral environment, maybe just using the CLI, the .timer and a script of SQL queries to run. After that I would test in my own specific environment and see if results match.

(7) By Swastik (961679) on 2022-07-31 12:55:56 in reply to 6 [link] [source]

I don't think so shared_cache is causing problem in my case, because I've also tested by commenting shared_cache.

In my case also I have multiple readers and single write. I wanted to make reads faster so used large cache to achieve this. If you feel mmap_size will help in achieving faster reads is there any thumb rule to come up with mmap_size PRAGMA value given a DB file size?

(8) By Simon Slavin (slavin) on 2022-07-31 14:58:54 in reply to 5.0 [source]

Your OS will make choices on which parts of memory to keep in real memory based on how often parts of memory are used, and how active it thinks each program is. These choices are usually very good, and far better than a human could make.

So test out what your OS is doing, and how much delay it's causing, before deciding that it is doing something wrong. If you do need to change what the OS is doing, how to do it (if it's possible at all) depends on which OS you're using. I only know that kind of detail about Macs so I probably can't help you. Sorry.

(9.1) By Keith Medcalf (kmedcalf) on 2022-07-31 16:43:13 edited from 9.0 in reply to 5.0 [link] [source]

The default cache size should be -2000 which is about 2 MB. Unless you changed it, of course.

None of those changes will have any appreciable affect on READING the database, except for the negligible increase in cache size and perhaps allowing multiple threads to be used for sorting and restricting temp to memory.

For updates, ignore_check_constraints will only have effect to turn off any check constraints. If you do not want check constraints, simply not coding them into the table definitions might be perspicacious. Otherwise, not much difference.

(10) By Keith Medcalf (kmedcalf) on 2022-07-31 16:40:59 in reply to 1.1 [link] [source]

One reason I thought of is: Since SQLite DB is on SSD time taken to read pages from disk(SSD) and time taken to read pages from memory(cache) is almost same - I may be completely wrong here!

You are incorrect here by about 1000 orders of magnitude.

Reading a "bunch-o-bytes" from RAM is 1000 times (or more) faster (requires only 1 memory read from the application cache or 1 memory read from the system cache plus 1 write to the application cache plus 1 read from the application memory cache) than accessing an SSD to read those same bytes (which would require at least 1 device read, 1 memory write, 1 memory read, and 1 memory write). If it were not then there would be no point in having expensive RAM would there not be?

The best way to optimize I/O is to not do it. It has been thus since the beginning of time.

(11) By Swastik (961679) on 2022-07-31 16:49:00 in reply to 9.0 [link] [source]

"except for the negligible increase in cache size and perhaps allowing multiple threads to be used for sorting and restricting temp to memory"

I agree,

This is where I'm not able to find noticeable difference in speed change. Both default configuration and my configuration taking almost same time to execute the queries

(12) By Swastik (961679) on 2022-07-31 17:17:04 in reply to 10 [link] [source]

Agreed!

(13) By anonymous on 2022-08-01 12:51:52 in reply to 7 [link] [source]

The mmap_size should match or be a little higher than the size of your active page set. If you are accessing all your records in a completely random fashion then if your memory is large enough you should set the mmap_size to something that can hold the entire database. The mmap size is an upper limit, the system will not allocate that much memory until it is all requested for read/write.