Statements outside of explicit transactions in SQLite

(1) By James Oldfield (james.oldfield) on 2021-01-03 18:00:10 [link] [source]

There's a lot of waffle below, but I suppose you could summarise it all as: What happens to statements outside of explicit transactions? The documentation seems a bit unclear on this (although it's usually excellent by the way, many thanks for that).

Autocommit mode vs implicit transactions: Oddly, the section on implicit transactions doesn't mention autocommit mode. Autocommit mode only seems to be mentioned in passing on the sqlite3_get_autocommit(), which conversely doesn't mention implicit transactions! But both talk about statements not executed within an explicit transaction (BEGIN and COMMIT/ROLLBACK), so those are related... right? Am I right in thinking that an implicit transactions are only ever started when in autocommit mode, and conversely any statement executed in autocommit mode (except BEGIN ...) will start an implicit transaction?

Implicit transactions with multiple statements: This is my key confusion, where the documentation doesn't seem to match reality. The section on implicit transactions, linked above, says:

An implicit transaction ... is committed automatically when the last active statement finishes. A statement finishes when its last cursor closes ...

So I tried a test: In one thread, I ran a SELECT statement, but in between retrieving rows (2 at a time) I INSERTed some more rows (1 at a time) and did deliberate pauses. In another thread, using a different connection, I repeatedly did a SELECT COUNT(*). From what I understand of that quote, the SELECT and INSERTs in the first thread should have consisted of a single implicit transaction, because the "last active statement" on that connection had not yet finished (I hadn't reset the SELECT, and indeed continued to retrieve more rows through it afterwards). But, instead, the COUNT(*) kept increasing on the other thread, showing that the INSERT had committed immediately.

Is that quoted text correct? Does it mean something different than I interpreted it?

(Just to be clear: I'm actually very happy that statements not in a transaction are committed immediately, even when a cursor for a SELECT statement is also open. It's just that I'm confused about what the documentation is referring to.)

Can an implicit transaction hold up an explicit transaction being committed: The document File Locking And Concurrency In SQLite Version 3 mentions:

The SQL command "COMMIT" does not actually commit the changes to disk. It just turns autocommit back on. Then, at the conclusion of the command, the regular autocommit logic takes over and causes the actual commit to disk to occur.

But according to that earlier quotation, an implicit transaction won't be committed if a cursor is still open. Does this mean that even an explicit COMMIT statement for an explicit transaction might not actually commit data right away, if a cursor is still open from a statement? That would be very surprising! It seems not, given my previous experiment, but I just wanted to confirm this.

(2) By Tim Streater (Clothears) on 2021-01-03 19:20:12 in reply to 1 [link] [source]

Far as I know, any isolated statement (select, insert, etc) is automatically surrounded by BEGIN/COMMIT unless you've already opened an explicit BEGIN. My app has a couple of hundred usually isolated statements to update this or that part of some db or other. If I had to BEGIN/COMMIT each of these life would be rather tedious.

I also have a couple of places where I know I'll be doing a lot of inserts, so I do all those inside a transaction, for speed.

That clear?

(3) By Keith Medcalf (kmedcalf) on 2021-01-03 19:27:40 in reply to 1 [link] [source]

Your description is accurate. "Transaction" processing is controlled by "locks". All "lock" processing is "automatic". An "explicit transaction" is merely exerting manual control by "turning off (BEGIN) and on (COMMIT)" the automation.

A "read-lock" is required when reading from the database, and a "write-lock" is required to write to it (and exclusive access -- meaning no OTHER readers -- is required to "make it so").

Every statement, when executed, must acquire the locks that it requires for the duration of its execution. When the statement completes, the additional locks that the statement acquired are released. Locks which were already acquired on the same connection will not be released until that other statement that acquired them is complete.

This means that when you execute a SELECT statement on a connection, it will acquire a READ lock for the entire duration of the statement execution. If while this statement is executing you execute an INSERT statement (on the same connection), that connection must acquire a WRITE lock. When that INSERT statement completes then the data is written (assuming there is NO OTHER CONNECTION that has a READ lock by temporarily acquiring an EXCLUSIVE lock and writing the data) and releases the WRITE lock. It cannot release the READ lock because the SELECT still requires it. Once the SELECT statement is complete, then the READ lock can be released.

The BEGIN statement makes some small changes to this process. It sets a flag that stops the "automatic release of acquired locks". In addition it may also acquire a specific lock type at the time it is executed though the default is just to set the flag and not acquire any locks. If you use the form BEGIN IMMEDIATE, then a WRITE lock is obtained immediately as well as turning off the "auto release mode". Mutatis Mutandis BEGIN EXCLUSIVE with an EXCLUSIVE lock.

Now when you make changes (ie, execute an INSERT statement) and the "auto release" mechanism is turned off, the WRITE lock is held after the statement completes and the changes are not yet written to the database. Eventually a COMMIT statement will be executed. This will "turn the auto-release" flag back on and when the COMMIT statement ends it will release the locks THAT ARE NO LONGER REQUIRED (in the case of releasing an EXCLUSIVE or WRITE lock, it will write the changes to the database). However, if there is a statement executing which requires a READ lock then that read lock will not be released until that statement ends (which will now release its locks because the "auto-release" flag has now been turned back on).

(4.1) By Keith Medcalf (kmedcalf) on 2021-01-03 19:42:55 edited from 4.0 in reply to 3 [link] [source]

This "auto-release" process and the flag are called "auto-commit". What it means is that lock levels that are no longer required are "automatically released and associated data is committed to the database in the case of a WRITE lock" when a statement ends. BEGIN and COMMIT merely disable and enable this flag. The process of lock acquisition by a connection and the statements executing on it is unchanged.

(5) By James Oldfield (james.oldfield) on 2021-01-04 22:21:57 in reply to 4.1 [link] [source]

Many thanks for this very comprehensive reply. It's much appreciated, especially since I can see there's a continuous deluge of questions here.

Out of curiosity, I made a little of test of what you said, and verified it's correct (although I never doubted it): On one connection I executed BEGIN, then SELECT and started iterating over the results, then COMMIT and continued iterating over the earlier cursor. On another connection I repeatedly tried executing INSERT statements. This was in journaling mode so the SELECT held up the INSERTs as expected. Sure enough, as you said, this did not release on the COMMIT, and instead the INSERT only progressed once the SELECT was exhausted.

I have to admit, I find this very surprising. I'd expect COMMIT (or ROLLBACK) to release all locks, and an attempt to continue using an open statement from before to result in an error, something like "transaction for that statement has now been closed". Now I know, I'll be more careful about resetting incompletely iterated statements! (I'm actually using the Python sqlite3 wrapper module, so this translates to calling .close() on the Cursor object in that API.)

I have a follow up question: Are there are any statements that write to the database but return multiple rows? I just ask because, if so, that would mean that an open cursor to such a statement would cause the write lock to be retained even after a COMMIT. Assuming that's not the case, we can always be sure a successful COMMIT will at least store all data in the database, even if it potentially doesn't release a read lock.

(6) By Keith Medcalf (kmedcalf) on 2021-01-04 23:09:52 in reply to 5 [source]

All statements which "write" to the database occur within the processing of that one statement. The only statements which can "write" to the database are CREATE, INSERT, UPDATE, DELETE, ALTER, and DROP (I think I got them all).

In each case the database change (the writing) is entirely contained within the single "execution" of the statement (that is, the statement runs to entirely to completion in a single sqlite3_step). The execution state is not "held open" because there is no actual cursor involved (no result generator).

This is in contrast to a SELECT statement which is conceptually similar to a Python generator function. The first sqlite3_step causes the statement to begin execution (and obtain a read lock), and it then yields a row, and each time it is subsequently stepped, until eventually it runs out of rows (raises a StopIteration exception) at which point the execution context is closed (and the read lock released, if autocommit is in effect).

That is, it ends up looking like this:

def Statement()
    beginingLockState = getReadLock()
    while (row := findNextResult()) is not None:
        yield row
    if autocommit is True:
        releaseLocksExcept(beginningLockState)

There have been occasional requests to implement INSERT ... RETURNING ... and UPDATE ... RETURNING ... which would mean that those statements would generate result rows while executing their updates (although there are other methods which could be used to obtain the same effect without having to have the RETURNING clause that could already be implemented with existing facilities, doing that is complicated and some people are lazy).

At present, however, RETURNING is not implemented so all SQL statements which can perform updates are single-step execution contexts and only the SELECT statement is a result generator.

Note that the PRAGMA statement is a special case and may fall into either class (or in some cases is performed at PREPARE time and does not require execution at all) depending on the particular PRAGMA.

(7) By Keith Medcalf (kmedcalf) on 2021-01-04 23:24:52 in reply to 5 [link] [source]

Also note that the Python sqlite3 wrapper automagically manages the transaction state for you (success at this depends on the version of the PySqlite2 extension code that the wrapper is using -- newer versions use a better detection of "write" statements and old ones are limited by guessing based on the first word in the SQL statement).

This automagic is controlled by the isolation_level parameter of the sqlite3.connect function. If the isolation_level is not None (the default is '', an empty string) then it must be a character string which is appended to a BEGIN statement that is automagically executed before any write statement.

To turn off the automagic you need to open the connection with isolation_level=None.

See also APSW https://rogerbinns.github.io/apsw/ that behaves more like sqlite3 and less like a lowest-common-denominator DB-API interface.

(8.1) By James Oldfield (james.oldfield) on 2021-01-05 14:19:51 edited from 8.0 in reply to 7 [link] [source]

Thanks for another great reply.

Yes, the Python sqlite3 wrapper sometimes fiddles with transactions automatically. I actually started this whole process by innocently wondering how to start and stop transactions and ended up much further down the rabbit hole than I intended! At some point I hope to document my findings in a StackOverflow answer which, no doubt, no one will ever read :-) I'll include a note about closing cursors for incompletely exhausted SELECT statements.

As a result, I've spent a lot of time reading the source code to sqlite3 and pysqlite recently. Sadly, I can tell you that the logic for when to automatically start a transaction in sqlite3 does still depend on textual matching: if the statement starts with INSERT, UPDATE, DELETE or REPLACE (intended to catch DML statements) then a transaction is started just before executing the statement itself. This comparison skips over whitespace but not comments (e.g. "/*foo*/INSERT ..." is not considered to start with INSERT). This only happens if isolation_mode is not None, as you said, and only if a transaction is not already started (as determined with sqlite3_get_autocommit(), so it will notice transactions started by the developer manually executing BEGIN).

It's true that there was a move in the past to away from statement parsing. Here's how the situation got (back) to this point:

In the past (before Python 3.6 and before pysqlite 2.8.0), some statements would cause an implicit COMMIT beforehand (if a transaction was open, as determined with !sqlite3_get_autocommit()). Again, this only happened if isolation_mode is not None. The rule was: if a statement started with INSERT, UPDATE, DELETE or REPLACE (intended to catch DML), BEGIN a transaction; if it started with SELECT (i.e. DQL), do nothing; otherwise (this case is meant to catch DDL like CREATE TABLE), COMMIT a transaction.

Then pysqlite (commit 94eae50) got rid of all the above logic, and replaced it with simply: if !sqlite3_stmt_readonly() then start a transaction. That got rid of all the statement parsing, and the removal of the implicit transaction COMMIT meant you could include DDL like CREATE TABLE in a transaction implicitly started by pysqlite. But this exact version of the code never made it into a release.

The next day, the logic was changed (commit 796b3af) to if !sqlite3_stmt_readonly() && !is_ddl then start a transaction, where is_ddl was defined as a statement starting with CREATE, DROP or INDEX, so there was still text matching in there. This was done to improve backwards compatibility with existing code. This version still never implicitly committed transactions. This was released as pysqlite 2.8.0. That is still how it stands to this day, and is unlikely to change given that its readme says "This project is not actively maintained any more: You are better off using the sqlite3 module in the Python standard library".

This was initially merged into Python 3.6 beta, but issue 28518 was raised about backwards compatibility: conn.execute("BEGIN IMMEDIATE"), which had previously worked, would raise an exception "cannot start a transaction within a transaction". That's because this is not a read only statement (as explained in a comment by D. Richard Hipp), so a transaction was now being started automatically just before it was executed. Such statements are better suited to isolation_mode = None, but the issued showed the change would've broken existing code, so before Python 3.6 was released the code was changed to the current logic. Compared to Python 3.5, that meant that only the "otherwise" branch had really changed, and the release notes simply said "sqlite3 no longer implicitly commits an open transaction before DDL statements".

Funnily enough, the DB API spec - which as you said is the motivation for all this funkiness - says that a transaction should be opened before any statement at all, even a SELECT. This isn't clear from the document itself but I came across an archived email from the PEP author that says this is what they intended (but I can't find the link at the moment). Given that sqlite3 doesn't satisfy the spec, all this automagic behaviour seems like wasted effort!