impact of cache_size

(1) By kasztelan (kasztelan77) on 2021-04-07 21:21:45 [link] [source]

Is cache_size setting supposed to impact only performance and memory usage or is it normal that if the cache_size is in use but not set large enough it can result in out of memory errors? I have a query which happens to select a large number of rows. When the application uses pragma cache_size, that query results in "SQLite error: failed to allocate 268435456 bytes (error code: 7)" when it tries to execute the first sqlite3_step(). If I remove pragma cache_size setting, the query executes successfully. The select is against a single table with three clauses against indexed columns. Is this expected and I simply have to do without cache_size or a value that is large enough to cover the worst case, or should this work regardless of cache_size?

(2) By Keith Medcalf (kmedcalf) on 2021-04-07 23:29:30 in reply to 1 [source]

No, Yes, No.

Insufficient information for meaningful diagnosis.

Minimal information required would include:

the cache size you are trying to set
the page size in effect
the number of pages that need to be read during that "first step"
the total amount of memory in the machine
the amount of memory available to the process
the memory model in use
the Operating System

Without the above information no diagnosis is possible.

From your description it would appear that an attempt is being made to allocate about 256 Megabytes of memory and the operating system has told you to bugger off because the amount of memory requested is not available.

(3) By kasztelan (kasztelan77) on 2021-04-08 19:35:55 in reply to 2 [link] [source]

Thanks, what confuses me is that the exact same code runs the same query fine when PRAGMA cache_size is not explicitly called at startup, so the OS itself has no problem allocating that 256 megs.

Here are answers to your questions:

page size is 8192
cache size is set to 114176
the select matches around 52K record, with an average size of a record about 4k
total amount of RAM in my machine is 16 Gb, but the same problem occurs also on a much beefier server
not much running on the machine, several gigs free, memory ulimits set to unlimited for the process
SLES 15

I expected the cache_size to direct how much of the database SQLite is allowed to keep in memory, not limit the size of any one request.

(4) By Keith Medcalf (kmedcalf) on 2021-04-08 20:02:44 in reply to 3 [link] [source]

Well, no. The cache size is ALWAYS set, just you may specify a value other than the compiled in default. The default (unless you changed it at compile time) is -2000 (meaning about 1.95 Megabytes).

The cache_size setting sets the number of pages (or kilobytes, if the number is negative) of memory that can be used for the application page cache.

Setting the cache_size to 114176 with an 8K page size means the cache size limit is 892 Megabytes.

The number of rows in the projection result is irrelevant to the page cache. If the table contains 1 zillion rows and a table scan is required to answer a query that results in only one match, every single page must still be read to answer the query and that means that the last (cache size setting) MRU pages will be located in the application page cache.

Having 16 GB or RAM is only relevant if the memory model is 64-bit flat, and the process is permitted to allocate that much memory (32-bit processes, for example, are usually limited to 2 GB of virtual arena usage because the upper 2 GB are used to map the OS).

You can test your hypothesis that setting the cache size is causing the problem by setting it to a smaller number. If that works then it is because your OS is limiting memory allocations.

(5) By kasztelan (kasztelan77) on 2021-04-09 18:09:40 in reply to 4 [link] [source]

Thanks a lot for the explanations. Indeed it looks like our problem is too big of the cache_size. When I change it to 16384 pages (128 Mb) or less, this out of memory no longer occurs and the query executes successfully.

(6) By Keith Medcalf (kmedcalf) on 2021-04-09 19:16:18 in reply to 5 [link] [source]

You will note that the page cache (per connection) is not the only user of "dynamic" (allocated from the heap) memory in SQLite3. Each statement also allocates and releases memory, the internal schema structures occupy memory, and memory is used for "temporary working tables" and "sorting".

The amount of memory used for temporary sorting and working space is also controlled to some extent by the cache size setting -- the cache size is not a fixed maximum but rather a "recommendation" (although it does work to limit the number of physical pages in the connection primary cache, it has other effects that may not be obvious).

There are other settings which may be configured to place control on the total amount of address space actually consumed which range from using a custom memory allocator (memsys5 for example) to setting hard and soft allocation limits, setting the "spill size" for the page cache (the number of pages that may be "dirty" before requiring them to be flushed to disk, which is normally unbounded), and whether temp files are entirely memory resident, or entirely file resident (though each temporary file also has a separate in-memory cache even if the residence is set to file).

You will note that the page cache is at the application level and cache's file data within the applications address space. This is independent of the Operating System (or filesystem) level cache which will operate no matter what the page cache size is. The difference is that servicing I/O from the application level cache is performed entirely by the application without making a filesystem/Operating System call, even though that call may be satisfied from the filesystem/OS cache and not perform a physical I/O. The effect of increasing the cache_size is that transitions between 'user' and 'system' and back are avoided if the page I/O can be satisfied from the application cache.

Like all cache's, eventually the overhead of locating data in the cache will exceed the overhead of just doing the I/O (or user to system to user) transition. The "art" is in choosing ostensibly the smallest cache size that provides the maximum benefit.