Minor WASM OO1 API bug with exec() stale column info

(1) By Roy Hashimoto (rhashimoto) on 2023-05-10 17:18:59 [source]

This is a low likelihood and low severity bug, but I figured there's no harm in reporting it.

The OO1 API exec() implementation caches prepared statement data prior to beginning execution of the statement. If the database schema has changed from the last time the connection loaded it (either at open or the previous transaction), then the prepared statement may be re-prepared and the cached data could be stale.

For example, column names for the prepared statement are acquired here, before the statement is first stepped here. While executing sqlite3_step(), the statement can be recompiled:

If the database schema changes, instead of returning SQLITE_SCHEMA as it always used to do, sqlite3_step() will automatically recompile the SQL statement and try to run it again.

If the statement gets column names from the schema, e.g. with "SELECT * ...", the column names for the statement may have changed across the sqlite3_step() call, so the names exec() provides via columnNames or the callback may be stale.

Similarly the Stmt object created here also caches schema-dependent data (and can be accessed via the callback).

This only matters when:

There are multiple database connections, and
The schema is dynamically modified or replaced, and
The statement is prepared before its transaction begins, and
Column information for the statement is accessed by the caller.

A reasonable fix would be to grab the column info after the first step.

(2) By Stephan Beal (stephan) on 2023-05-10 17:46:32 in reply to 1 [link] [source]

A reasonable fix would be to grab the column info after the first step.

Very nice catch :). This seems like a very unlikely case, in particular for the web environment, so i won't push to have this in the pending 3.42 release (for which we are currently "pencil's down") but will patch it immediately after the release.

A reasonable fix would be to grab the column info after the first step.

Delaying caching until the first step() isn't quite right because the output columnNames option needs to be usable for data with no result rows:

       - `columnNames`: if this is an array, the column names of the
       result set are stored in this array before the callback (if
       any) is triggered (regardless of whether the query produces any
       result rows). If no statement has result columns, this value is
       unchanged. Achtung: an SQL result may have multiple columns
       with identical names.

but we can delay fetching the names until the first step() or, if there are no rows, collect the column names after the loop.

(3) By Roy Hashimoto (rhashimoto) on 2023-05-10 18:29:28 in reply to 2 [link] [source]

This seems like a very unlikely case, in particular for the web environment

I agree that it isn't urgent, but the reason I immediately knew this was a bug in your code is because I found this exact same bug in my code a few months ago while exercising my...web application.

Here's the plausible scenario how this can occur:

My web application can load database tables from different sources, like a CSV file or some online store, and when you load a new table it replaces any existing table with the same name. The app also has a SQL browser where you can interactively query the database. If you want, you can open the app in two browser windows so you can load (and do other things) in one and query in the other.

If I loaded a table, did a SELECT * FROM table, then loaded a new table to replace it, and repeated SELECT * FROM table, what happened is I saw the new table's rows with the old table's column headers. I thought this was a bizarre caching problem or I had corrupted my database until I figured it out.

Note that it doesn't require actual contention to express the bug, just separate connections.

(4) By Stephan Beal (stephan) on 2023-05-10 18:50:16 in reply to 3 [link] [source]

I agree that it isn't urgent, but the reason I immediately knew this was a bug in your code is because I found this exact same bug in my code a few months ago...

LOL! :)

Your report has me second-guessing how some other parts behave in the recompile case (which the back of my mind was vaguely aware of but which never got any genuine consideration) and wondering whether i need to, e.g., replace the cached-at-prepare() Stmt.columnCount value with a property intercepter which fetches it as needed.

In any case, i'm looking closely at this and hope to have a patch checked in in the next day or two, but it won't be merged into the trunk until after the 3.42 release.

Thank you once again for your continued feedback!

(5) By Stephan Beal (stephan) on 2023-05-10 21:36:49 in reply to 4 [link] [source]

In any case, i'm looking closely at this and hope to have a patch checked in in the next day or two,

That's now in src:/timeline?r=oo1-no-cache-Stmt.columnCount.

In short, that transforms columnCount to a property access proxy for sqlite3_column_count() and adjusts DB.exec() to delay fetching the column names until after the first step().

... but it won't be merged into the trunk until after the 3.42 release.

That still stands, as we're in "pencil's down" mode for the pending release, where only fixes for significant breakage are permitted in trunk. It will be merged into trunk after the release.