SQLite Forum

New type of transaction (or a new behavior for DEFERRED transactions)
Login

New type of transaction (or a new behavior for DEFERRED transactions)

(1) By RenatoF (renatof) on 2021-05-03 15:29:25 [link] [source]

I suggest a new type of transaction is added to SQLite (or a new behavior for DEFERRED transactions).

Whenever an attempt to acquire an exclusive lock to write to a file by transaction A would fail due to the existence of other writer (with an intent to write lock) in transaction B, instead returning SQLITE_BUSY immediately, transaction A would simply release (drop) it's lock allowing transaction B to finish their jobs and new attempts to write (trying to obtain a new lock) would be done automatically by SQLite for transaction A (up to the timeout limit).

"Deadlocks" would never occur!

The risk of the same row being updated twice (the "retried" operation A reverting a change just made by the write in the transaction B, which was favored by A to prevent a deadlock) is minimal - and this risk exists at application level anyway, no matter which SQLite transaction method is used, in case the statements in transaction A where processed/prepared shortly before the statements sent via transaction B, but B transaction was still opened shortly before A:

  • application query(select) data from DB, process it, etc
  • application (in another thread/process) prepare data as well
  • the B thread/process starts it's transaction first!
  • the A thread/process starts it's transaction lately...

If this does not make sense to you, then you could at least consider making IMMEDIATE transactions the default. The performance may be lower than DEFERRED transactions, but IMHO it mimics better how a client-server DBMS works, where SQLite handles the retries with it's own mechanism by itself no matter it takes slightly more time, leaving the DEFERRED transaction as an advanced option for those looking for increased performance at the "cost" of having to handle the retries at application level.

(2) By Richard Damon (RichardDamon) on 2021-05-03 15:51:47 in reply to 1 [source]

The problem with not giving an Error on the detected deadlock is that inconsistent results could happen. If transaction A reads some values, computes some changes based on those values and then goes to write them, but finds that someone else is doing a write, then because that write might change the values that A has read, the values A computed might not be right after B finishes its write, thus A does need to redo its work.

In a more complicated world, SQLite could keep track of every record that a read transaction has read, and only give the busy if a write transaction changes some of that data, but that goes away from 'lite', but is what some Database systems will do.

Note, with the rules SQLite uses, it is impossible for an application to read 'stale' data and write back results based on it as long as the full operation is part of a single Transaction, as any write operation that is in progress or finishes after the read transaction started (and thus might have changed values) makes SQLite return the error for trying to update the read to a write transaction. And not making it a single transaction is basically saying you don't care about the stale data problem.

Making IMMEDIATE transactions the default would reduce the ability for a read-only transaction to be able to start if there is a write transaction in progress. You of course always have the ability to just make your own application just issue BEGIN IMMEDIATE for every transaction, or every transaction that just might want to write so you don't need to worry about getting the deadlock BUSY answer.

(3) By RenatoF (renatof) on 2021-05-03 16:47:19 in reply to 2 [link] [source]

The problem with not giving an Error on the detected deadlock is that inconsistent results could happen. If transaction A reads some values, computes some changes based on those values and then goes to write them, but finds that someone else is doing a write, then because that write might change the values that A has read, the values A computed might not be right after B finishes its write, thus A does need to redo its work. (...) not making it a single transaction is basically saying you don't care about the stale data problem.

That's why I suggested a new type of transaction could be created. When you are storing logs, for example, it doesn't make difference storing A,B or B,A.

Making IMMEDIATE transactions the default would reduce the ability for a read-only transaction to be able to start if there is a write transaction in progress. You of course always have the ability to just make your own application just issue BEGIN IMMEDIATE for every transaction, or every transaction that just might want to write so you don't need to worry about getting the deadlock BUSY answer.

It is just non-intuitive the default type of transaction of a DBMS could cause deadlocks so often and require retry mechanisms at application level, while no other DBMS require so.

Making IMMEDIATE transactions the default would make SQLite much easily adoptable. Any quick search about SQLite brings a lot of results of people saying it is not good, just because of the constant deadlocks (which could be avoided using IMMEDIATE transactions).

Also, libraries and tools that "standardize" commands for different DBMS systems would not need modify the "BEGIN" statement to "BEGIN IMMEDIATE" specifically for SQLite, causing even less friction in SQLite adoption.

Alternatively, there could be a SQLite specific statement (preferably "IMMEDIATE") to acquire a lock (with automatic retries, up to the defined timeout - not failing immediately) after "BEGIN" command has been sent. Example:

BEGIN;
IMMEDIATE; <- acquire write lock (as using BEGIN IMMEDIATE)
SELECT ...;
UPDATE ...;
COMMIT;

i.e.: BEGIN; + IMMEDIATE; would behave the same as the single statement BEGIN IMMEDIATE;

Yet, the statement could be sent outside a transaction to make all further transactions to have that "mode" automatically.

IMMEDIATE; <-- set default transaction mode
BEGIN; <-- immediate transaction
SELECT ...;
UPDATE ...;
COMMIT;
BEGIN; <-- immediate transaction
SELECT ...;
UPDATE ...;
COMMIT;
DEFERRED; <-- set default transaction mode
BEGIN; <-- deferred transaction;
SELECT ...;
UPDATE ...;
COMMIT;

(4) By Keith Medcalf (kmedcalf) on 2021-05-03 19:13:31 in reply to 1 [link] [source]

Whenever an attempt to acquire an exclusive lock to write to a file by transaction A would fail due to the existence of other writer (with an intent to write lock) in transaction B, instead returning SQLITE_BUSY immediately, transaction A would simply release (drop) it's lock allowing transaction B to finish their jobs and new attempts to write (trying to obtain a new lock) would be done automatically by SQLite for transaction A (up to the timeout limit).

The situation as it currently exists has been well documented, known, and researched since the introduction of relation DBMS systems way back in the 1970's and for "concurrent" systems since their inception.

There have been many enhancements made to the SQL language as a result of this observation. The first, of course, being the introduction of the "FOR UPDATE OF" clause in DB2 back in the late 1970's or early 1980's (I do not remember exactly since I was still in highschool at the time). This was followed by the introduction of "BEGIN IMMEDIATE" in the mid-1980's or so in order to allow avoidance of the lock-escalation deadlock by utilizing the (by then) well known deadlock avoidance mechanism of always acquiring all the locks you need at once, always in the same order, and always releasing them all when the acquisition of a required lock fails.

If this does not make sense to you, then you could at least consider making IMMEDIATE transactions the default. The performance may be lower than DEFERRED transactions, but IMHO it mimics better how a client-server DBMS works, where SQLite handles the retries with it's own mechanism by itself no matter it takes slightly more time, leaving the DEFERRED transaction as an advanced option for those looking for increased performance at the "cost" of having to handle the retries at application level.

SQLite3 is not a client/server DBMS. Many client-Server DBMS, because they run on a whopping big server that may be thousands of miles away from the client, defer all "conflict detection" until actual COMMIT time. They can afford to do this because they are (a) very big (b) very fast (c) very far away; and, returning an error "Sorry about your luck" when a commit fails for no apparent reason is acceptable.

The problem with doing what you suggest is that it will lead to undetectable errors. Because SQLite3 does not have a "server process" that can keep track of all the clients independent doings, there is no way for the SQLite3 library to know what each database file user is doing. A Client/Server DBMS has a single server process that knows exactly what each client is doing and can be optimistic in its behaviour because it can detect problems and conflicts later and if they arise then issue a "sorry about your luck" error and abort the transaction -- something which SQLite3, being an in-process data access library, cannot do.

As for making BEGIN IMMEDIATE the default, this is not a good idea. Besides having a profound impact on concurrency, an application may not always want start a transaction because it intends to perform an update. An application may begin a transaction in order for a sequence of reads to be repeatable. Although other mechanisms can exist to ensure repeatable-read, they all require participation of a "server" process to co-ordinate multiple clients -- which does not exist in SQLite3.

You will note that many "wrap around libraries" ("frameworks" as they are called these days -- gigabytes of code that does inscrutable things) may permit one to set that as default.

(5) By Keith Medcalf (kmedcalf) on 2021-05-03 19:23:13 in reply to 3 [link] [source]

In theory the proper place to do that would be a pragma such as, lets say, pragma transaction_default=deferred|immediate|exclusive with the default behaviour being deferred, which would set the "default behaviour" for beginning a transaction (whether by begin or savepoint when no explicit transaction type is given).

This would not help, however, in those cases where a "wrapper" implements a transaction by spinning up a new connection for the transaction since there would be no way without modifying the wrapper to alter the default behaviour. And if you have to modify the wrapper anyway, then why not just modify the wrapper to work more better in the first place by allowing the specification of whether to start a specific type of transaction?

(6.1) By RenatoF (renatof) on 2021-05-03 21:33:47 edited from 6.0 in reply to 5 [link] [source]

It would help a lot, actually. In PHP's PDO, for example, we could just send "pragma transaction_default=immediate" at selected times, before calling PDO::beginTransaction() (which does not take parameters) instead creating extensions of the native PDO class (as Drupal did) just for use IMMEDIATE/EXCLUSIVE transactions (and having to selectively instantiate the proper class separately for every individual transaction).