SQLite Forum

Suggestion Re Statement Cache
Login

Suggestion Re Statement Cache

(1) By Larry Brasfield (larrybr) on 2023-09-05 22:21:16 [link] [source]

Recently, I had occasion to think about scenarios where somebody wishes to execute the same SQL repeatedly, and wishes to avoid the resource usage associated with compiling the SQL into a VDBE instruction sequence and storing that redundantly. Or, more trickily, running the same DQL or DML from multiple threads.

I have also seen words on prepared statement caching, something that the Tcl interface does now at the library client level, and the possibility that the SQLite library might do the same some day.

These two (or three) concepts, after stewing in my skull for awhile, lead to this proposal:

Perhaps there should be no prepared statement caching. Instead, what is cached is compiled VDBE programs. Prepared statements, if created with SQL that is in the VDBE program cache (as a key), get something from the cache instead of the compiler being run. That something might be a copy, or it might even be a reference so that recompiles (due to schema changes) update all dependent prepared statements at the same time. Or, this could be done like a lazy write, updating VDBE sequences that are not in execution but leaving ones alone that are amidst execution.

This would, I submit, reduce motivation for applications to be keeping prepared statements lying around or to be trying to use a single prepared statement from multiple threads with the synchronization that entails.

The registers and other execution related data kept with prepared statements would be created at need, cheaply I expect. But recompilation of SQL could be avoided by use of a VDBE program cache. Its size could be tunable, or sqlite3_prepare_v3() could have a flag value indicating that the given SQL is to be cached.

(2) By Warren Young (wyoung) on 2023-09-06 06:35:23 in reply to 1 [link] [source]

Pending benchmarking that proves my intuition wrong, that sounds like an overall pessimization. Looking up a prepared statement in a cache by a static text key is fast, to the point that it should be possible to do it at program compile time given a suitably clever data structure and a smart-enough language compiler.1

Your alternative has broader application in that it works from any language, over any API, but it adds the runtime cost of SQL compilation to the cache lookup time. I don’t see how any current language compiler could reasonably be expected to optimize that out at compilation time.

If you’re willing to do the work, and it’s optional, I have no objection. I do think that if you implement this, benchmarking will show it to be conditionally faster, aiding the likes of Tcl users while being valuable from highly optimized statically compiled languages2 only when they’re being used in a “scripting” style.3


  1. ^ Never tried it, never needed it, but I’ve seen modern C++ compilers pull off wonders…
  2. ^ C, C++, Rust, Swift…
  3. ^ Oppose the cycle-bumming coding style used in high-volume applications, where developer time trades off favorably against large-N multipliers: big data centers, embedded, computational finance, games, AI, etc.

(3) By Rowan Worth (sqweek) on 2023-09-06 06:59:10 in reply to 2 [link] [source]

I don't think Larry was suggesting changing the interface to the cache:

Prepared statements, if created with SQL that is in the VDBE program cache (as a key), get something from the cache instead of the compiler being run.

ie. the cache is still keyed based on SQL text, there's no compilation involved in a cache lookup.

(the nuance between caching a prepared statement vs a VDBE program is lost on me, which may also indicate that I have misunderstood the proposal)

(4) By Larry Brasfield (larrybr) on 2023-09-06 08:29:02 in reply to 3 [link] [source]

nuance between caching a prepared statement vs a VDBE program

For purposes of this discussion, a sqlite3_stmt object contains several kinds of state: (1) A sequence of VDBE instructions which the SQL compiler produces and which may be interpreted by the VDBE (aka "virtual database engine")1; (2) A set of bound parameter values, possibly empty; (3) A set of registers which are used as intermediate or output data values during execution of the VDBE instruction; and (4) An instruction pointer.

The 1st kind of state is determined almost solely by the SQL presented2, during the sqlite3_prepare() call. This can be expensive in terms of execution time.

The 2nd kind of state (parameter binding) is logically part of client usage of the prepared statement, while the 3rd and 4th kinds are developed during sqlite3_step() calls.

The 1st kind of state (VDBE instructions) can be shared among threads with minimal precautions needed to avoid non-deterministic behavior. Such sharing would allow the expense of SQL compilation to be avoided where the same SQL is being used repeatedly.

The 2nd-4th kinds of state are necessarily associated with execution of the VDBE and generally should not be shared among threads. Threads need to keep their own VDBE execution state to avoid interfering with each other when they independently run prepared statements.

My suggestion (or proposal) is that, within the library implementation, without substantial exposure at the API, VDBE programs can be cached to serve two distinct purposes: (1) execution time improvement where the same SQL is being run a lot, such as the Tcl adapter; and (2) memory savings where large VDBE programs can be reused rather than replicated.

I don't think Larry was suggesting changing the interface to the cache:

Right. It's a key/value store where the keys are whole SQL statements and the values are VDBE programs. It resembles a prepared statement cache, except only the 1st kind of prepared statement state is kept.


  1. ^ These are what is shown via "EXPLAIN ...".
  2. ^ The VDBE instructions can be affected by collected statistics during optimization, so "solely" is a slight stretch.

(5) By Donal Fellows (dkfellows) on 2023-09-06 10:23:13 in reply to 4 [link] [source]

The main problem with having a cache (of anything) is working out when to invalidate the cache. That's often difficult to get right (speaking as someone who's written a number of the caching mechanisms inside Tcl).

Under what circumstances would a cached VDBE program cease to be valid?

(6) By Gunter Hick (gunter_hick) on 2023-09-06 11:05:49 in reply to 5 [link] [source]

At the latest when it needs to be re-prepared because the schema has changed. Which makes it hard to share the VDBE code, because it is presumably quite unsafe to replace the VDBE code out from under a "running program". Which means you need a reference count so that the "old version" can be freed once every "running program" has been stopped.

(8) By Larry Brasfield (larrybr) on 2023-09-06 19:52:05 in reply to 5 [link] [source]

Under what circumstances would a cached VDBE program cease to be valid?

I would share them using a reference-counting scheme, and those whose count reaches 0 would be subject to a least-recently-used flushing algorithm. The number of not-presently-used VDBE programs (in the cache only) would be tunable.

(13) By Roger Binns (rogerbinns) on 2023-09-08 15:24:52 in reply to 8 [link] [source]

(Author of a Python SQLite wrapper that has its own statement cache.)

I would be delighted to delete my statement cache code, and instead rely on one built in to SQLite. Every other SQLite wrapper in every language seems to implement a statement cache so there is definitely a need there, and I have asked for one before.

I don't think you need to do anything as complicated as is being proposed. My approach is:

  • When sqlite3_prepare_v3 would be called, look in the cache first and return something from there if the SQL text matches. (It is taken out of the the cache leaving a hole.)
  • When sqlite3_finalize would be called, put the sqlite3_stmt back into the cache evicting a member if the cache is at capacity. The evicted member is actually finalized. Bindings are cleared for the statement put into the cache to free memory.

My original cache used a dictionary/hash table for the cache. The drawback is that you can only have one cached statement per key, and it is common for the same SQL text to be active multiple times.

My second implementation is a simple array of int which is the hash value of the SQL text. When looking for an entry it is a linear scan to find a matching hash value, with a secondary array containing structures with the SQL text, length, and sqlite3_stmt. Linear scans are fast on modern processors, especially compared to random access pointer chasing like in various tree structures. There is a magic hash value meaning unoccupied and an incrementing index to which entry is next to be evicted.

This approach doesn't need any reference counting or similar overhauls, and works well.

Some other minor details:

  • The entire text used for prepare is used as the key, not just the first statement
  • When the text is larger than 16kb there is no caching. The number was determined empirically and avoids something like restoring a dump from doing lots of pointless cache management
  • The prepare flags also have to match in addition to the SQL text
  • Explain mode is also treated like a prepare flag and I only expose it as though it was one
  • There is a no cache prepare option because some queries want that such as a pragma setting a password. Also authorizers only fire during prepare so their use requires no cache.
  • I provide an api for getting cache statistics and entries which helps validate the cache is working

It is all these details that are tricky and why I'd prefer SQLite has this as optional functionality since it is one place to get all the nuances right, instead of each wrapper having their own quirks and bugs.

(14) By Larry Brasfield (larrybr) on 2023-09-08 17:42:06 in reply to 13 [link] [source]

I don't think you need to do anything as complicated as is being proposed. ...

I do not see caching only the reusable and semi-constant part(s) of a prepared statement object as being significantly more complicated than caching whole prepared statements along with their execution support parts which have to be initialized anyway upon reuse.

Clearly, for users of the API as it exists today, there is little choice. But what I am suggesting is a way to get the benefit of whole-prepared-statement caching with an optimization that benefits multiple, simultaneous uses of the same SQL.

It is all these details that are tricky ...

No argument on that.

(15) By Roger Binns (rogerbinns) on 2023-09-08 18:01:49 in reply to 14 [link] [source]

... as being significantly more complicated than caching whole prepared statements ...

What I meant is that doing even simplistic caching is sufficient, and what everyone has independently developed today. That alone would provide a lot of benefit, especially getting the tricky details right. Having a SQLITE_PREPARE_CAN_CACHE flag to sqlite3_prepare_v3 would be the only needed change to the SQLite API.

As you point out, behind the scenes SQLite can then use reference counts, break the statement data structure into parts etc, but none of that would need to be exposed, nor have public APIs. It is only if the SQLite team doesn't implement the cache (as has been the case for 20 years) that some machinery could be exposed. But it would expose ugly implementation details, and it would be strongly preferred the team do this, get it right, and let all the wrappers delete code that almost certainly has bugs.

(16) By Larry Brasfield (larrybr) on 2023-09-08 18:49:51 in reply to 15 [link] [source]

I pretty much agree with this.

Much of what I am trying to bring out in this thread is that, by doing the caching within the library, it can be done in a way which yields more performance benefits than are available to clients doing whole-sqlite3_stmt caching.

(17) By Pepijn Van Eeckhoudt (pepijnve) on 2023-09-09 13:51:33 in reply to 16 [link] [source]

by doing the caching within the library

I can't justify it properly, but I feel like I would prefer to control this type of caching from the application level. The library handling it automatically as a replacement for full statement caching, while convenient, seems to take away some control from the application developer and seems to make the performance behaviour of the library less predictable?

A couple of questions come to mind:

Would the cache be global state or tied to the database connection?

How is cache eviction controlled? How can I guarantee sharing for some performance critical statements for instance if it's a transparent global cache with a size upper bound? Or is it unbounded, but using reference counting which would mean I still need to hold on to at least one statement instance in my application where performance matters.

Related to this, are you planning on providing a kind of global opt-out? An OMIT... compile flag for instance?

it can be done in a way which yields more performance benefits than are available to clients doing whole-sqlite3_stmt caching

Could you elaborate on that a bit? I would like to understand what could be gained besides reduced memory usage.

(20) By David Jones (vman59) on 2023-09-10 20:31:28 in reply to 17 [link] [source]

I agree with this. If you have a performance sensitive application and use the SQLite API, I'd expect you to code prudently when it comes to repeated queries using the same SQL. Wrapper libraries, OTOH, may get themselves into redundant prepares from trying to provide a simplified or encapsulated programming interface to the client. I'd rather see a cache object that can help wrappers with repeat queries than have caching be automatic.

For my usage patterns, the times SQLite is 'too slow' are from poorly optimized execution against large databases, not excessive compilation of the SQL.

(21) By ddevienne on 2023-09-11 07:49:38 in reply to 20 [link] [source]

For my usage patterns, the times SQLite is 'too slow' are [...], not excessive compilation of the SQL.

I haven't followed that thread too closely, TBH, but I tend to fall in that camp.

SQLite's SQL parser and comparatively simple NGQP planer are fast enough.

So while a Statement cache seems attractive, from a theoretical standpoint,
and especially as Roger puts it, i.e. an opt-in flag on _prepare_v3 as the sole API,
I'd rather invest Richard's time (if it was my call :)) on other topics (like better blobs).

So I'm +0 on Roger's approach to Statement caching, -1 on others. FWIW.

(23) By Pepijn Van Eeckhoudt (pepijnve) on 2023-09-11 10:21:26 in reply to 21 [link] [source]

If there was a kind of caching available, what I would personally like to be able to use is something like this. (Disclaimer: I'm just throwing an idea out there. This isn't a fully baked design; there may be glaring flaws. It's a fleshed out version of the explicit mechanism I was suggesting earlier. Apologies up front if this is a compete brain fart)

(1) Some way to compile SQL to VDBE and a way to release that object.

typedef struct sqlite3_program sqlite3_program;

int sqlite3_compile(
  sqlite3 *db,            /* Database handle */
  const char *zSql,       /* SQL statement, UTF-8 encoded */
  int nByte,              /* Maximum length of zSql in bytes. */
  unsigned int compFlags, /* Zero or more SQLITE_COMPILE_ flags */
  sqlite3_program **ppProgram,  /* OUT: Program handle */
  const char **pzTail     /* OUT: Pointer to unused portion of zSql */
);

int sqlite3_free_program(
  sqlite3_program *pProgram
);

(2) Some way to create a new execution context using the compiled program.

int sqlite3_prepare_v4(
  sqlite3 *db,            /* Database handle */
  sqlite3_program *pProgram, 
  unsigned int prepFlags, /* Zero or more SQLITE_PREPARE_ flags */
  sqlite3_stmt **ppStmt  /* OUT: Statement handle */
);

In other words, make the two types of state (type (1) and types (2, 3, 4) as Larry described it) associated with a prepared statement today explicit. This is similar to the fork suggestion, but you 'fork' from a specific object instead of another statement.

For existing usages of prepare_v2 and v3 nothing would change. A sqlite3_program may be created internally, but if there is one it's hidden to the user and its lifetime is tied to that of the sqlite3_stmt. I.e. the sqlite3_stmt is the owner of the sqlite3_program instead of the user application.

To make automatic recompiling feasible you probably need the reference counting logic that has been hinted at in this thread in the implementation so that you don't rug pull in flight queries. In other words, internally that sqlite3_program may have to be a bit more sophisticated than just a simple opaque pointer to a VDBE program to make automatic recompilation possible. It's more like a little mini-cache for a single SQL string that's user created/managed.

What I don't quite see yet is how you could handle bind parameter dependent VDBE program caching (assuming that's a thing; I might be misinterpreting the documentation). You would want to cache the minimal set of distinct query plans, but if I understand it correctly you would only know that you want to use a different query plan during the optimisation stage of the compiler. This creates a bit of a chicken or egg problem. Perhaps that's just something that can't be cached. I was wondering how this would be handled for the transparent cache as well.

There are a couple of reasons I would like a design along these lines

  • It's opt-in. As others have stated SQLite is plenty fast as is. For situations where performance is already fine I don't have to think about potential implications on the behaviour of a hidden cache. (Avoiding cache thrashing, cache sizing, guaranteeing caching where it's required, etc.) Where I do need every last drop of performance I can get it. For existing users, there's no risk of performance regressions since that all stays the same.
  • It's conceptually similar to, but much less error prone than, prepared statement pooling which many people seem to be doing today already. Pooling is tricky because you're trying to reuse an object that has complex mutable state. You need to make sure you don't use the same instance twice at the same time, you need to make sure the instance is returned to the pool instead of being freed, you need to think about how many instances you keep alive, you need to make sure bound paramaters are cleared to not risk leaking state from one usage to the next, etc. With the proposed design sqlite3_stmt can be used in a simple RAII style.
  • It seems simpler for the SQLite developers to implement since they can punt on a bunch of tricky cache design decisions. Those are moved to the user side of the fence instead. I can then apply whatever caching approach makes sense for my usage of the library.

These reasons are probably also the downsides as well though.

  • It's not automatic, so you're not improving performance transparently for all users.
  • It's a new API concept that will have to be maintained in a backwards compatible fashion. I can't judge if this is something you would want to commit to or not. I do think some notion of a compiled statement is sufficiently intrinsic and abstract that that shouldn't be a problem, but that's hard to predict.

(24) By Roger Binns (rogerbinns) on 2023-09-11 14:44:50 in reply to 17 [link] [source]

The target of a statement cache is NOT an app developer writing code in C directly using the SQLite API, carefully and tightly controlling everything their code does.

It is the various wrappers for SQLite that provide a higher level of abstraction for other languages like Python, Perl, TCL, Java, C# etc. Those wrappers usually do not expose the low level SQLite statement machinery because there is no benefit, and it is a lot of work. For example in my Python wrapper you simply write:

connection.execute("select ... where ...", (7, "123 Main St")).get

At the C level it has to call sqlite3_prepare, the binding functions, sqlite3_step, the sqlite3_column functions, and sqlite3_finalize. There is no value in higher level languages or libraries in exposing all these underlying details.

And developers are usually using yet another library/layer of abstraction on top of this. They work with ORMs and records and Employee and Invoice with changes and reads working through the layers. That is how you end up with the same SQL statements being repeatedly generated, and why a cache is beneficial.

Would the cache be global state or tied to the database connection?

Connection since that is the scope of a sqlite3_stmt.

How is cache eviction controlled?

LRU is the norm for caching.

How can I guarantee ...

By writing the exact code you want.

The most straight forward implementation inside SQLite would be:

  • SQLITE_CAN_CACHE flag to sqlite3_prepare_v3 which says the sqlite3_stmt could come from a cache, and could be placed back into the cache after use
  • sqlite3_finalize could add the sqlite3_stmt to the cache if SQLITE_CAN_CACHE had been used in the prepare
  • sqlite3_db_config to control cache maximum size
  • SQLITE_OMIT_STATEMENTCACHE to exclude the code at compile time

For reference the statement cache in my Python wrapper is implemented entirely in C and follows this pattern. It is 350 lines of code, of which 50 is a function to get the cache statistics and entries. The cache itself is a simple linear array indexed by hash value of the SQL text, and allows the same SQL text to be present multiple times because that does happen.

it can be done in a way which yields more performance benefits than are available to clients doing whole-sqlite3_stmt caching

Could you elaborate on that a bit?

Using the existing public SQLite API only lets you do caching on the opaque sqlite3_stmt pointer. This is what sqlite3_stmt really is, and a cache inside SQLite could use that for optimization, with reduced memory usage being the most obvious. Another example is if a statement has to be reprepared by SQLite because the schema changed then it only has to be done once, not for every sqlite3_stmt of the same SQL.

(25) By Pepijn Van Eeckhoudt (pepijnve) on 2023-09-12 06:34:09 in reply to 24 [link] [source]

The target of a statement cache is NOT an app developer writing code in C directly using the SQLite API, carefully and tightly controlling everything their code does.

I think there's value in both. One does not preclude the other. But I understand your point. It would be nice that things just automatically work for all users.

FWIW, I'm looking at this as both an application and a wrapper developer. I write application code in Java that use SQLite. The wrapper I use is an in-house developed one that binds the C API using libffi. The C functions are all exposed directly at the lowest level. There's a more idiomatic OO version of that built on top of it that's still the SQLite API in structure. And then on top of that there's the JDBC driver.

Having the direct SQLite API available is still useful. There's an impedance mismatch between the JDBC API on the one hand and the SQLite API on the other hand. Where I really need minimum overhead it's beneficial to program with the SQLite API directly. And their caching would be useful.

If there were a cache facilitating API available in SQLite (see other thread) I would be inclined to implement that on the Java side in the JDBC layer. An LRU cache is much simpler to implement at that level (since it's part of the standard library already) rather than in C.

(28) By David Jones (vman59) on 2023-09-14 09:52:57 in reply to 17 [link] [source]

Here's a proposal for a cache object that lets library writers request caching with a modicum of control over the cache size. The central idea is a provisional sqlite3_stmt that can be the fifth argument to sqlite3_prepare_v3 when flagged as such by the fourth argument.

/*
 * Proposed API for caching assistance.
 *
 * New functions:
 *    int sqlite3_stmtcache_create()
 *    int sqlite3_stmtcache_search()
 *
 * Modified functions:
 *    int sqlite3_prepare_v3()
 *    int sqlite3_stmt_status()
 */
int sqlite3_stmtcache_create (
 	sqlite3 *cnx,
	int max_entries,
	int max_program_size,
	int max_storage,
	sqlite3_stmt_cache **cache );   /* OUT: returns cache object address */

int sqlite3_stmtcache_search (
	sqlite3_stmt_cache *cache,
	const char *key,
	sqlite3_stmt **prov_stmt,     /* OUT: new object */
	int flags );
/*
 * The return list flag for search makes prov_stmt contain return a
 * bound statement object whose result set is a list of the cache entries.
 * Schema for the cache's virtual table:
 *    CREATE TABLE x(key, state, create_time, size, hits);
 */
#define SQLITE_CACHE_RETURN_LIST 0x01

/*
 * Extra codes used by sqlite3_prepare_v3() and sqlite3_stmt_status().
 */
#define SQLITE_PREPARE_PROVISIONAL 0x08
/*  If set in prepflags, indicates the ppstmt argument is a pointer to the
 *  provisional object returned by sqlite3_stmtcache_search().
 */

#define SQLITE_STMTSTATUS_CACHE_STATE 98
/* Additional op code for sqlite3_stmt_status, whose 'counter' value is
 * one of of the following values:
 *    SQLITE_CACHE_STATE_IGNORED
 *    SQLITE_CACHE_STATE_NASCENT
 *    SQLITE_CACHE_STATE_LOADED
 *    SQLITE_CACHE_STATE_OVERSIZE
 *    SQLITE_CACHE_STATE_BOUND
 */
#define SQLITE_CACHE_STATE_IGNORED 0
/* Cache not used, due to one of:
 *   - Statement was prepared without SQLITE_PREPARE_PROVISIONAL flag.
 *   - Search found nascent entry, continue without caching to avoid race.
 *   - Cache limit exceeded and could not free existing entry.
 */

#define SQLITE_CACHE_STATE_NASCENT 1
/* Possible status for provisional statement between search and prepare calls,
 * indicating an incomplete entry was created. The prepare call will compile
 * the SQL and complete the entry.
 */

#define SQLITE_CACHE_STATE_LOADED 2
/* Possible status for provisional statement between search and prepare calls,
 * indicating an existing entry was found that prepare may use.
/* 

#define SQLITE_CACHE_STATE_OVERSIZE 3
/* Possible status for provisional statement between search and prepare calls,
 * indicating an existing entry was found, but previous prepare step created
 * an oversize entry. The prepare call will update cache statistics, convert
 * state to ignored, and proceed with prepare.
 */

#define SQLITE_CACHE_STATE_BOUND 4
/* State after prepare call, VDBE program is saved in cache entry. Statement
 * finalize will decrement ref count.
 */

(7) By David Jones (vman59) on 2023-09-06 12:26:14 in reply to 4 [source]

What about having a fork statement call which lets a thread create a new sqlite3_stmt object where the VDBE program is a reference back to the parent but all other state is independent? The thread would treat it as any other statement object and finalize would handle updating reference counts.

(11) By Larry Brasfield (larrybr) on 2023-09-07 16:11:00 in reply to 7 [link] [source]

I would be wary of having that "parent" be somehow more special than the "forked" sqlite3_stmt objects. Lifetime management issues arise from that.

But there is nothing to prevent sqlite3_stmt objects from sharing a VDBE program. It is already kept in memory separate from the struct known as sqlite3_stmt. Its ownership is a one-to-one relation now, but that could change in a well known way.

(18) By Pepijn Van Eeckhoudt (pepijnve) on 2023-09-09 15:18:40 in reply to 4 [link] [source]

Right. It's a key/value store where the keys are whole SQL statements and the values are VDBE programs.

The VDBE instructions can be affected by collected statistics during optimization, so "solely" is a slight stretch.

How would the cache handle the optimisation case? Would this (logically) be done by using sqlite3_expanded_sql to obtain the cache key?

(19) By Larry Brasfield (larrybr) on 2023-09-09 15:55:11 in reply to 18 [link] [source]

How would the cache handle the optimisation case?

I don't understand the question. Optimization is done as (or before, actually) the VDBE program is generated. It is just that program that goes into the cache.1

Would this (logically) be done by using sqlite3_expanded_sql to obtain the cache key?

No. That API returns SQL with bound parameters substituted. The cache key needs to be SQL matching what clients present to sqlite3_prepare(); to do otherwise would defeat avoidance of the compilation.


  1. ^ Some data associate with the VDBE program would stay with it, such as how many registers it needs, their initialization, etc.

(22) By Pepijn Van Eeckhoudt (pepijnve) on 2023-09-11 08:39:22 in reply to 19 [link] [source]

I don't understand the question. Optimization is done as (or before, actually) the VDBE program is generated. It is just that program that goes into the cache.1

Perhaps the documentation is wrong or I'm misinterpreting it, but today sqlite3_prepare docs state:

If the specific value bound to a host parameter in the WHERE clause might influence the choice of query plan for a statement, then the statement will be automatically recompiled, as if there had been a schema change, on the first sqlite3_step() call following any change to the bindings of that parameter.

Intuitively that makes sense since depending on statistics the most optimal query plan (and as a corollary the VDBE program) may differ. That's why I was asking the expanded_sql-as-key question.

(9) By Pepijn Van Eeckhoudt (pepijnve) on 2023-09-07 13:09:42 in reply to 1 [link] [source]

This would, I submit, reduce motivation for applications to be keeping prepared statements lying around or to be trying to use a single prepared statement from multiple threads with the synchronization that entails.

How would this operate in a multi-threaded context? Since SQLite does not support/allow use of database connections and derived objects from more than one thread concurrently, concurrent queries require multiple database connections. To get any benefit from a VDBE cache, the cache would have to cross connection boundaries. Or am I misinterpreting the multithreaded use restrictions?

Unrelated to that, it feels (purely on intuition) like the cache management might get tricky very quickly. Schema changes are one thing, but if I understand it correctly statistics also play a role in query planning. So running analyze might have to invalidate the cache as well. Or would you provide a manual cache invalidation mechanism to avoid getting stuck with a suboptimal cached query plan.

That being said, I have often thought while working on my application that recompiling the same SQL string to VDBE seems so wasteful as well. The API is was intuitively looking for to help with that was something that could give me an opaque pointer to the compiled version of the statement that I could pass in to sqlite3_prepare as something it might be able to reuse. Something explicit rather than transparent/automatic in other words.

(10) By Larry Brasfield (larrybr) on 2023-09-07 15:59:02 in reply to 9 [link] [source]

How would this operate in a multi-threaded context?

Clearly, for library builds that support multithreading, the caching mechanism would have to be thread safe. There are many ways to do that, and the critical sections needed to do so would be short. Conceptually, it is the same problem that is presented by thread-safe containers and allocators.

Since SQLite does not support/allow use of database connections and derived objects from more than one thread concurrently, concurrent queries require multiple database connections. To get any benefit from a VDBE cache, the cache would have to cross connection boundaries.

That last assertion slightly overstates the case. Even single-threaded (and single-connection) application could benefit from avoided SQL recompilation.

But your connection boundaries point is apt. To be most useful, the VDBE program cache would need to be available to multiple connections, just as allocated memory is. And it would need similar protection against concurrent access. But that protection need not extend to use of the cached (or shared out) objects; it only needs to cover the management of the cache or updates of the VDBE program objects when schema changes occur.

... So running analyze might have to invalidate the cache as well.

That's an interesting proposition. The effects of analyze are optimizations, affecting performance rather than correctness. Having prepared statements laying around, even today, is bound to occasionally result in lost optimization opportunity. I would not view this as "cache invalidation", but the question of when analyze takes effect, and whether any SQL compilation can be reused across execution of ANALYZE certainly deserves careful thought.

Or would you provide a manual cache invalidation mechanism to avoid getting stuck with a suboptimal cached query plan.

Perhaps running "ANALYZE ..." implicitly has that effect. After all, it takes time (presumably) and dumping unused cache objects incurs just a time penalty. Even cache objects presently held by sqlite3_stmt objects could be marked for recompilation (or replacement with a recompiled VDBE program.)

... recompiling the same SQL string to VDBE seems so wasteful as well.

Yes, that is why the Tcl adapter keeps some prepared statements laying around.

[on explicit management of compiled statements]

In my view, that's a much bigger step. I would defer that until the details of automatic management are worked out and some experience is gained with it. However, that sort of control is why I mentioned that sqlite3_prepare_v3() might gain a flag value to affect caching.

(12) By anonymous on 2023-09-07 20:43:55 in reply to 10 [link] [source]

sounds like https://www.sqlshack.com/understanding-sql-server-query-plan-cache/

(26) By Donal Fellows (dkfellows) on 2023-09-12 11:10:25 in reply to 10 [link] [source]

To be most useful, the VDBE program cache would need to be available to multiple connections

However, that requirement makes the cache massively more complex. By comparison, making the cache be per-connection simplifies things a lot as so much other state is also per-connection and we already know that the cost of recompiling SQL to a VDBE program is usually not too great as that's what happens right now. It won't support all scenarios equally well, but it will make the implementation simpler. (It also means that the implementation of ANALYZE doesn't need to be careful; it can just spoil the cache if it changes anything at all, an operation easily implemented with an in-memory-only epoch counter.)

The observation that it would be useful to have a flag to mark a statement as being cacheable/non-cacheable makes a lot of sense to me. It doesn't require the implementation to cache statements, but it lets client code indicate that some statements are definitely not a good idea to cache. It would be entirely a matter for the cache implementation whether it has size limits for SQL or implements a fixed size with an LRU eviction policy. (Tcl's regular expression engine does that sort of trick on a per-thread basis.)

(27) By Stephan (stephancb) on 2023-09-12 15:31:16 in reply to 26 [link] [source]

Could such a VDBE program cache live in a new system table (like `sqlite_schema') which (optionally) goes together with the database file?

  • then multiple connections could use it?
  • as a bonus such a table would provide some kind of "frequently used queries" (FUQs) demonstrating how the database has been queried (the table needs to have both the original SQL string and the VDBE program, which the cache probably would have anyway).

I realize some complications with this, like attached databases.

(29) By mlaw (tantaman) on 2023-12-11 16:44:46 in reply to 1 [link] [source]

Any updates on the statement cache? Is it likely to be implemented?