Anton Dyachenko

(1) By anonymous on 2021-06-03 22:46:18 [link]

Hi, yesterday I have encountered a very strange issue with sqlite, it is hard for me to isolate the issue to a small reproducible scenario so I will start with a written description and maybe somebody can reveal something related to the issue:

* my app uses lots of connections from many threads
* we have unit tests that perform lots of migrations and check relevant properties of the resulting schema
* migrations are always run as one of the first things we do with our dbs (after checking the version and setting some pragmas)
* up to yesterday unit test didn't fail
* yesterday I have changed which connection unit test should use, most likely previously unit test used the same connection that performs migrations, but now it is another one (but I am not sure about this)
* one of our unit test creating a db with very old schema and then performs a bunch of migrations
* in the initial schema we had a view with the name say XXX, but after massive migrations, we removed this view and created a table with the same name XXX
* unit test checks that this new table XXX has a unique constraint on two columns by trying to insert the same pair of values twice and expecting the second attempt should fail with constraint violation
* yesterday this test starts failing on the FIRST insertion saying that XXX is a view
* I guessed that sqlite caches schema for each connection in the memory and didn't update it after migrations, so my next guess was that VACUUM operation (as one of the most global change from sqlite point of view) should force the cached schema to be updated, and that fixes my unit test
* our users may have very large dbs and calling vacuum after each migration is a real performance penalty especially we are going to release a new version of our app that has MASSIVE schema change in a serial of migrations that would multiple vacuum calls on 10x
* of course, opening my db in db browser shows the most recent schema and all queries from my unit test work as expected. I checked it specifically during debug session just before performing the first insertion from my app, in db browser everything is fine.
* I work on Windows and repro the issue on windows, but our app is cross-platform and works on Mac and Linux as well, however, I don't know if the issue is present on other platforms
* one more note, that might be related to the issue, my app uses custom vfs(based on 5.12 qt) that we know has issues with flushing/syncing data to disk as a trade-off for performance but likely this is not related in this case due to db browser has the recent schema

sqlite should update cached schema on all connections automatically or specify in the documentation how and when exactly I as a developer should do it manually. Having vacuum as a fix is likely not acceptable due to perf penalty (except we can't find a better option) and I would like to understand if this is a programmer error or bug inside sqlite.

(2) By Simon Slavin (slavin) on 2021-06-04 11:40:46 in reply to 1 [link]

Please explain what you mean by 'migration'.  What does your software do for each 'migration' ?

Is the database stored on a disk in the computer doing the processing, or is it accessed across a network ?

Do you have more than one connection accessing the same database at the same time ?

Do you have more than one thread using the same connection at the same time ?

> in the initial schema we had a view with the name say XXX, but after massive migrations, we removed this view and created a table with the same name XXX

Having a table and a view with the same name has proved to cause problems in the past.  Can you rewrite some of your code to avoid it ?

> one more note, that might be related to the issue, my app uses custom vfs(based on 5.12 qt) that we know has issues with flushing/syncing data to disk as a trade-off for performance but likely this is not related in this case due to db browser has the recent schema

When you access the database with a database browser for diagnosing your problem, are you using that same VFS ?  If not, does your custom VFS understand the locking that the other VFS might be using to access the same database ?

(3) By anonymous on 2021-06-05 03:52:51 in reply to 2 [link]

>Please explain what you mean by 'migration'

Migration is either (or both) change in the schema of one or more tables or changing data in one or more tables, needed when our app introduces new functionality.

>What does your software do for each 'migration' ?

the workflow in our app for any created connection is:

1. open a database with flags SQLITE_OPEN_CREATE | SQLITE_OPEN_READWRITE | SQLITE_OPEN_FULLMUTEX with our VFS
2. setting pragmas journal_mode persist, foreign_keys = ON
3. setting some handlers like bussy handler, commit hook, custom regexp function
4. reading database metadata like version and uuid stored in a one-row table
5. check if the app latest expected db version higher than the version stored in the db

migration-related workflow (when app version is higher than db version). We use three numbers for encoding version major/minor/patch. Major number changes when we are going to break forward compatibility (new db won't work in old app version), minor version changes when we introduced any table schema change, and patch changes when only data change is needed:

1. app searches for all registered migrations needed to be applied to the db based on the current db version, each migration provides pair of (fromVersion, toVersion) metadata
2. for each migration a new transaction is started mostly to isolate each migration from others
3. we run all migrations consequently until db version reach the current app version

after db is fully migrated to the current app version it is registered to be used for the rest of the app logic.

>Is the database stored on a disk in the computer doing the processing, or is it accessed across a network ?

my app never does migration for a remote db, only local db can be migrated, and the app doesn't use conventional OS facilities for accessing db over the network. Our VFS is developed specifically to access db over the network.

>Do you have more than one connection accessing the same database at the same time ?

yes, this is possible in the app, but this is not the case for the unit test. For unit tests we use gtest and this one is 100% single-threaded, however, some other unit tests can be multithreaded.

>Do you have more than one thread using the same connection at the same time ?

no, this is prohibited/handled via connection pool. Whenever any piece of app wants to work with a db it asks for a free connection from the pool, when it is done it returns the connection back to the pool, with some cleaning routings. So any connection can be used only by one thread at a time but by many threads during a lifetime.

>Having a table and a view with the same name has proved to cause problems in the past. Can you rewrite some of your code to avoid it ?

In version 1.x.0 there was created a view XXX, then in version 2.a.0 we have introduced a new table with "internal"/temporary/pre-release name YYY, then in 2.a+n.0 version we have removed the XXX view and renamed the YYY table with "internal" name YYY to the same name that the view had XXX.

Yes, surely I can change the table name but it is kind of a last resort. The codebase of my app has a kind of ORM for pure people which provides some metadata about db schema like table and column names which then are used in the c++ code that dynamically generates queries - I think you can imagine what kind of unreadable nightmare it is. Since 2.x version, all queries that can be hardcoded now are hardcoded so there are lots of queries where this table name is hardcoded now. I have already changed it once from internal YYY to release name XXX so it is definitely possible to change it the second time but last time it caused serial of bugs, in other words from a QA point of view this is a VERY risky change that would require full regression of my app.

>When you access the database with a database browser for diagnosing your problem, are you using that same VFS ?

no, I always use vanilla db browser without committing changes, specifically to narrow down any potential issues related to our VFS

>If not, does your custom VFS understand the locking that the other VFS might be using to access the same database ?

our VFS implements sqlite locking protocol internally/in memory and doesn't expose outside any data that can be used to play nicely with other VFS and doesn't respect any other VFS locking. So this means that when I open a db in db browser simultaneously with my app accessing the same db then db browser can't know if it is safe to commit changes or not, that is why I never commit changes from db browser, but this is good enough for debugging purposes.

our app with this VFS exists for more than 7 years now, so apart from flush/sync I am not aware of any other issue related to VFS.

I believe that before my recent change the unit test has used either the same connection that performs migration or a connection that was created AFTER migration of a db. Now with my changes, I believe that unit test uses a connection that is created BEFORE migration starts. 

The test is the first who ask for a connection from the pool (which is empty initially). The pool creates one and returns it to the test, at this point the db has a view with the name XXX. Then migration code (in the same thread) asks for another connection from the pool which in turn creates a new one. Migration code performs all migrations in the second connection and returns it to the pool. At this point, the db doesn't have a view XXX but has a table with the same name XXX. Then the test continues with the first connection. 

I assume this first connection(used by test) doesn't refresh the cached schema, but when I have added vacuum after each migration's transaction the connection used by the test sees the most recent schema and test passes.

(4) By Kees Nuyt (knu) on 2021-06-05 11:47:52 in reply to 3 [link]

If the migration involves `PRAGMA writable_schema=ON`, you'd better discard the connection in which you did that and open a new one.

Richard Hipp has suggested several ways to refresh the cached schema:

From: <https://sqlite.org/forum/forumpost/29bad1c5c1>

> There are many ways to force a check of the schema to see if it needs to be reread.  Here is one:
> `PRAGMA user_version;`


From: <https://sqlite.org/forum/forumpost/8067ae2895>

> OK, so it looks like you have to actually do something that needs to use the schema before the schema is checked.  Something like this will suffice:
> ` SELECT 1 FROM sqlite_master LIMIT 1;`

(5) By Simon Slavin (slavin) on 2021-06-05 15:36:26 in reply to 3 [link]

Thank you for your detailed response.  Kees picked up two important points that your response talks about.

You seem to be programming carefully, taking care about problems to do with multi-access.

I am slightly concerned with how your VFS works but I have no experience with writing VFS, and your VFS unusual in that it does network access.  I don't know how this might complicate the inner workings of SQLite.

I can only wish you good luck.

(6) By anonymous on 2021-06-06 03:48:36 in reply to 4

I see the point, I will try it soon. Although it should help for each connection individually I would like to have a mass fix. I mean this approach requires each connection to run one of the reread schema queries, but what I am really looking for is something like a vacuum (but much more performant) that will force all opened connections to reread schema on the next statement execution.

Now I know when to call reread schema, immediately after migration, but I don't have a list of all opened connections (some of them can be created bypassing the pool), even more, if I would have it some connections could be used by other threads at the time. So I can set a flag in my connection wrapper (over sqlite3) and then run a reread query on the next attempt to prepare a statement but this is not so simple and error-prone due to my app caches some statements (most of those queries that don't have user-provided params and fires quite often), for multiple executions. So I would need to retire all of the prepared statements as well.

Now my app uses journal mode == there is the exclusive mutable statement (migration), so I don't worry about some other read statements that can run simultaneously (this is not possible), but we have plans to start using WAL where this is possible, and it is not clear for me what is the proper handling of this case. Should already running statements be interrupted or not? Interruption is "correct" but not always reasonable due to change in the schema can be quite often compatible with running statements so there is no point interrupting them. On the other hand, I, as a developer, don't have the facilities to say sqlite that it needs to recompile a statement other than interrupt and finalize it.

It is easy for sqlite to monitor all create, alter, and drop queries and do proper handling of such cases, so from my point of view this is clearly should be done internally in sqlite as it has much more options to do it properly so I believe this is a sqlite bug however not very important for most of the users.

Anyway, thank you for your help.

(7) By Kees Nuyt (knu) on 2021-06-06 13:49:16 in reply to 6 [link]

Yeah, it is complicated, I don't know where to begin. And I can't address all of your concerns.

> but what I am really looking for is something like a vacuum (but much more performant)

If VACUUM is a working, but too slow solution, this could be used instead:

```sql
BEGIN IMMEDIATE TTRANSACTION;
:
migration steps
:
PRAGMA schema_version = 1+(SELECT schema_version FROM pragma_schema_version);
COMMIT TRANSACTION;
```

[Some warnings apply](https://www.sqlite.org/pragma.html#pragma_schema_version)

I have no idea how safe this is when more than one connection updates the same database schema concurrently, except "they shouldn't do that".

> but we have plans to start using WAL where this is possible, and it is not clear for me what is the proper handling of this case. Should already running statements be interrupted or not? Interruption is "correct" but not always reasonable due to change in the schema can be quite often compatible with running statements so there is no point interrupting them.

As I understand it, any readers (SELECT) that are already sqlite_stepping() before another connection to the database changes the schema will proceed to see the schema and table definitions that were valid when the SELECT was sqlite_prepared() and the first sqlite_step() executed.

In other words, (cached) prepared statements can only be invalidated and recompiled when the first step is made and the engine sees the schema has just been changed. Once the first step is made, it will proceed do execute all further steps without recompiling again, because they run within a transaction (implicit or explicit).

(8) By anonymous on 2021-06-08 11:31:55 in reply to 7 [link]

Unfortunately, PRAGMA schema_version approach doesn't work.