sqlite3_last_insert_rowid and sqlite3_changes in a multi-threaded envirnoment

(1) By Lieven Geeraert (lieveng) on 2022-09-13 07:25:25 [link] [source]

As clearly mentioned in the doc's sqlite3_last_insert_rowid and sqlite3_changes are not safe to use in a multi-threaded environment when the same database connection is shared between different threads (ex. in threading mode = serialized = the default)

There is no guarantee sqlite3_last_insert_rowid and sqlite3_changes return the expected values (the result of the last query of the current thread) as there is always a small chance another thread has executed a query on the same database connection changing these values.

Currently the only 'good' work around is to use separate database connections for each thread, what can be an overkill, and doesn't use the 'power' and ease of coding of the serialized threading mode.

An easy fix could be to store a copy of the changes and last_insert_rowid in the sqlite3_stmt (Vbde struct) and provide api functions to retrieve these copies.

ex. int sqlite3_stmt_changes(sqlite3_stmt *pStmt); int sqlite3_stmt_last_insert_rowid(sqlite3_stmt *pStmt);

One could then use following sequence to get the result in a thread safe way

sqlite3_stmt *stmt; sqlite3_prepare_v2(db,"update ...",-1,&stmt,0); sqlite3_step(stmt); int changes=sqlite3_stmt_changes(stmt); sqlite3_finalize(stmt); if (changes) { .... }

If the development team is willing to review and integrate this, I'm willing implement this and supply it as a patch for review / merge. (of course free of any copyright,... from my side)

(2) By ddevienne on 2022-09-13 09:40:15 in reply to 1 [link] [source]

Currently the only 'good' work around is to use separate database connections for each thread

You can use the returning clause instead.

But using one connection per thread is the best advice.

(3) By Gunter Hick (gunter_hick) on 2022-09-13 11:27:12 in reply to 1 [link] [source]

Your solution is still not threadsafe. It just replaces one non-threadsafe field in the internal connection object by many non-threadsafe fields in the internal statement object(s) attached to the connection.

You seem to imply that a thread wanting to do work must prepare, execute and finalize each statement it wishes to execute anew for each desired execution of the same query.

This is contrary to the design concept and established practice of preparing a set of queries against a connection with the intent of re-running them multiple times, thus amortizing the significant cost of preparing a statement over multiple executions and also improving execution latency. The _V2 version of the sqlite3_prepare interface even takes care of detecting and automatically re-preparing a statement after a schema change.

Note that sharing a connection between threads means none of the threads can be sure about the transaction state of the connection unless it can be guaranteed that only autocommitted statements are executed, effectively disabling the concept of transactions.

(4) By Lieven Geeraert (lieveng) on 2022-09-13 13:06:41 in reply to 3 [link] [source]

Hi,

Your solution is still not threadsafe. It just replaces one non-threadsafe field in the internal connection object by many non-threadsafe fields in the internal statement object(s) attached to the connection.

I don't agree with this. The database connection object can be used safely (in serialized threading mode) between different threads, but not the stmt (Vbde) object. So if (a copy of) the changes and last_insert_rowid are stored inside the Vbde object, it's for sure the result of the last execution of this statement, and not the result of the last query on this connection (perhaps by another thread). My idea was to add 2 new int fields to the Vbde struct to store the changes and last_insert_rowid, and write them at the exact same time as when the equivalent fields are written to the sqlite3 connection struct.

This is contrary to the design concept and established practice of preparing a set of queries against a connection with the intent of re-running them multiple times

This is still possible!

sqlite3_stmt *stmt; sqlite3_prepare_v2(db,"update ...",-1,&stmt,0); while (...) { sqlite3_bind_...(stmt,...); sqlite3_step(stmt); if (sqlite3_stmt_changes(stmt)) { .... } sqlite3_reset(); } sqlite3_finalize(stmt);

Note that sharing a connection between threads means none of the threads can be sure about the transaction state of the connection unless it can be guaranteed that only autocommitted statements are executed, effectively disabling the concept of transactions.

Indeed. You cannot use transactions when sharing the database connection between threads. But this is unrelated to the last_insert_rowid and changes problem. If you argue that you must always use separate connection for each thread, then why was serialized threading mode ever implemented, and even made the default?

You can use the returning clause instead.

Indeed, I didn't know this one. I learned something new now. But this has the possible drawback of requiring a lot of memory when for ex. doing an update on a lot of rows, where we might only be interested in the number of rows affected (like sqlite3_changes()), and / or the last inserted row id, and not on the real data itself (as the returning clause does).

I'm just trying to solve a problem I have on an existing project, without doing major rewrites. Adding this functionality to sqlite3 is in my case the easiest fix, and I just propose to share my work to 'the world' as I can imagine other people will face the same problem

It's just a suggestion to make sqlite3 even better.

(5) By Keith Medcalf (kmedcalf) on 2022-09-13 15:38:20 in reply to 1 [link] [source]

There is no guarantee sqlite3_last_insert_rowid and sqlite3_changes return the expected values (the result of the last query of the current thread) as there is always a small chance another thread has executed a query on the same database connection changing these values.

Of course it is guaranteed that sqlite3_last_insert_rowid and sqlite3_changes return the result from the execution of the last query on the connection.

The matter of this "not being the one you intend" is your problem.

If you do not create3 a situation in which you cause the situation to arise, then it will not.

(6) By Keith Medcalf (kmedcalf) on 2022-09-13 15:42:51 in reply to 4 [link] [source]

The database connection object can be used safely (in serialized threading mode) between different threads, but not the stmt (Vbde) object.

This is incorrect. A statement can be called on multiple threads. You may prepare it in one thread, then retrieve one of each of the 1,000,000 returned rows by calling sqlite3_step on the statement in a different thread.

(7) By ddevienne on 2022-09-13 16:17:13 in reply to 4 [link] [source]

But this has the possible drawback of requiring a lot of memory when for ex. doing an update on a lot of rows

Nope, not for classic one-row-at-a-time insert.

Remember that step() is per-row, so you just read the rowid as-if from a regular single-row select from the statement, and so memory wise, it requires only what columns you ask to return, for a single row.

For updates that's different, but you did ask about sqlite3_last_insert_rowid.

It's just a suggestion to make sqlite3 even better

I'm afraid Richard might not see it that way... But who am I to judge :)

(8) By Keith Medcalf (kmedcalf) on 2022-09-13 17:49:51 in reply to 4 [link] [source]

If you argue that you must always use separate connection for each thread, then why was serialized threading mode ever implemented, and even made the default?

I would venture that serialized threading mode is cheaper to implement (and more useful) than implementing thread prohibitions.

As for why it was implemented was so that people could use it.

serialized threading mode is like buying insurance. There is nothing preventing you from using one (or more) connections per thread whether you are in serialized or nomutex mode, The difference is that if you purchased insurance (which is free and builtin, it just costs a wee bit of CPU time) then it does not matter if you "do not know what you are doing" because you will be protected from yourself. If you turn off insurance, then you may kill yourself.

(9) By Gunter Hick (gunter_hick) on 2022-09-14 06:39:03 in reply to 4 [link] [source]

Your use patern still requires a thread to prepare any statements it wishes to run (1 to n times a go) each time around; as opposed to exactly once, right after creating the connection (during thread setup), and finalizing them once, just before closing the connection (during thread rundown).

Serialized mode only serializes threads enough to prevent you from inadvertently killing yourself inside the SQLite library. It does not prevent you from killing yourself outside the library by making assumptions that do not hold in a multi-threaded environment (or indeed are documented not to be true).

Note that the constraint on multi-thread mode is: "no two threads using the same CONNECTION at any one time", where "using a connection" includes calling sqlite3_step() on a statement of this connection.

The least complicated way to ensure this is to give each thread exclusive use of its very own private connection. It can then pretend that it is running all by itself.

You can get away with creating a pool of connections and farming them out (regulating access to connections in your code) to threads as they need them, making threads wait for access. This includes a "pool" of one connection.

(10) By anonymous on 2022-09-14 06:51:57 in reply to 1 [link] [source]

Currently the only 'good' work around is to use separate database connections for each thread,

Using a mutex isn't good enough?

    sqlite3_mutex_enter(sqlite3_db_mutex(db));
    sqlite3_bind_value(insert, 1, newvalue);
    sqlite3_step(insert);
    newrowid = sqlite3_last_insert_rowid(db);
    sqlite3_reset(insert);
    sqlite3_clear_bindings(insert);
    sqlite3_mutex_leave(sqlite3_db_mutex(db));

(11) By Lieven Geeraert (lieveng) on 2022-09-14 12:21:04 in reply to 10 [link] [source]

Using a mutex isn't good enough?

I came up with this 'solution' too but I'm afraid this can deadlock, as we keep the mutex during the sqlite3_step call. Another thread can be running an separate insert/update/delete query (using prepare / step) and lock the database this way. (The mutex will not be automatically released anymore when calling the busy handler). Or we must not use busy handlers and handle SQLITE_BUSY everywhere...

(12) By Keith Medcalf (kmedcalf) on 2022-09-14 14:59:00 in reply to 11 [link] [source]

I would point out that you created the situation by ill-conceived design.

The solution is to re-conceive your design such that if you use one of these functions that retrieves data for "the last statement run on the connection", that you know which statement was the last one run on the connection.

It sounds like you do not know which statement was the last one executed on the connection, and therefore are using the "Plug-n-Pray" methodology.

Plug-n-Pray does not work reliably.

(13) By jose isaias cabrera (jicman) on 2022-09-14 15:44:06 in reply to 12 [source]

Plug-n-Pray does not work reliably.

This is one of the funniest thing you've said, Keith. Thanks.