How to init a database schema with many concurent accessors

(1) By JakeEhrlich on 2022-08-11 14:20:35 [link] [source]

I'm trying to make a tool that uses SQLite3 to cache some info. To access the cache a process is started up that opens/creates the database, makes sure the schema is initialized using sqlite3_exec with a bunch of create if not exists statements to ensure that I have the schema I expect with all the indexes I expect. This works well enough with one process but I keep getting SQLITE_LOCKED and SQLITE_BUSY under various circumstances.

I've tried many things but here's my current setup:

I have these pragmas

pragma auto_vacuum=incremental;
pragma journal_mode=wal;
pragma synchronous=0;
pragma locking_mode=normal;
pragma foreign_keys=on;

And then I start an immediate transaction to create the schema

begin immediate transaction;
create table if not exists ...
create index if not exists ...
create table if not exists ...
create index if not exists ...
create table if not exists ...
create index if not exists ...
...
commit transaction;

I then have the following C++ code (run once per process) to run this

    if (sqlite3_open_v2(db_path.c_str(), &db, SQLITE_OPEN_READWRITE | SQLITE_OPEN_CREATE,
                        nullptr) != SQLITE_OK) {
      log_fatal("error: %s", sqlite3_errmsg(db));
    }

    if (sqlite3_busy_handler(db, wait_handle, nullptr)) {
      log_fatal("error: failed to set sqlite3_busy_handler: %s", sqlite3_errmsg(db));
    }

    char *fail = nullptr;
    int ret = sqlite3_exec(db, cache_schema, nullptr, nullptr, &fail);
    if (ret == SQLITE_BUSY) {
      log_info("warning: It appears another process is holding the database open, check `ps` for suspended job-cache instances");
    }
    if (ret != SQLITE_OK) {
      log_fatal("error: failed init stmt: %s: %s", fail, sqlite3_errmsg(db));
    }

As you can see I've set the busy handler. Previously I was using busy_timeout but I discovered that while that used exponential back off it did not randomize. I got slightly improved results by implementing randomization myself but I'm still getting SQLITE_LOCKED messages when I run sufficiently complex setups. Here's my current busy handler.

static int wait_handle(void*, int retries) {
  // We don't ever want to wait more than ~4 seconds.
  // If we wait more than ~4 seconds we fail.
  constexpr int start_pow_2 = 6;
  constexpr int end_pow_2 = 22;
  if (retries > end_pow_2 - start_pow_2) return 0;

  useconds_t base_wait = 1 << start_pow_2;
  std::random_device rd;
  // Wait exponentially longer the more times
  // we've had to retry.
  useconds_t wait = base_wait << retries;
  // Randomize so we don't all retry at the same time.
  wait += rd() & (wait - 1);

  usleep(wait);

  // Tell sqlite to retry
  return 1;
}

These errors always seem to occur at startup and can be triggered more reliably by forking 8 times (creating 256 concurrent processes). I'm not clear on what I'm doing wrong at this point. What's the recommended way to initialize the tables and indexes in a database that will be used by many processes?

(2) By Keith Medcalf (kmedcalf) on 2022-08-11 16:30:43 in reply to 1 [link] [source]

I would use something that is easy to read to test if the database has been initialized. pragma user_version comes to mind.

The general startup procedure would look like this:

open database
check user_version
if user_version is correct goto 11
begin immediate
on success goto 8
maximum wait exceeded? goto catastrophic_error
wait a bit and then goto 2
create database schema
set user_version
commit transaction
carry on fruitfully

Secondly, one hopes that you open the database AFTER forking, not before?

(3) By Richard Damon (RichardDamon) on 2022-08-11 16:35:42 in reply to 1 [link] [source]

My first comment is that you are not using the power of writing in a programming language to try to optimize the startup, but mostly just using SQL to ignore the multiple initialization error. This means that ALL the processes need to actually attempt to initialize the database, and need to get the write lock to do so. This creates a lot of contention,

I would instead have each task initially first see if the first table exists, and if so, it can assume that some process has or is initializing the system, so it can skip that task, and go to a second loop to wait to see that whoever that was has finished the task. Make sure you end the transaction and start a new one between each test so you see the latest state.

If it doesn’t see the first table, then set the timeout short, and try to start the write transaction. If it fails, then some other task is doing the initialization, so you can also go to the final check, to wait for it to finish. If you want to be a bit paranoid (or if the same file might be used for different purposes with different sets of table) you could instead wait a bit the jump to the test for the first table again.

If the write transaction succeeds, you check again for the first table, and if it exists, someone beat you to the punch, after your first test, so end the transaction and go to the wait for completion loop.

Otherwise, you are the one and only process in charge of initializing the database, so you can create the tables, and add any initial data knowing that you are the one in charge, and at the end commit the results for others to see.

If you can do the full creation in a single transaction, then the second loop can be omitted if you have seen the first table, as they will all appear together, but you still need some test for the BUZY error on the write transaction.

This method means all the processses don’t need to compete to get a successful write transaction, which can be the scarce resource for the database.

(4) By JakeEhrlich on 2022-08-11 16:43:03 in reply to 2 [link] [source]

Yeah I'm forking before opening the database. Forking was just to create a more reliable reproducer for the issue that I'm seeing practice.

Why does reading first prevent SQLITE_LOCKED from returning? It clearly reduces write contention but I'm going to have write contention on my actual primary writes not just database initialization so I need to understand why I'm seeing SQLITE_LOCKED in the first place.

(5) By JakeEhrlich on 2022-08-11 16:47:14 in reply to 3 [link] [source]

Right this would reduce write contention considerably. I do however anticipate having a significant amount of concurrent writers that need to do actual writes not just initialization the database. Those writes don't have to be fast but they have to succeed in a reasonable amount of time. I need to understand why I'm reviving SQLITE_LOCKED at all, not just optimize the write contention.

(6) By Keith Medcalf (kmedcalf) on 2022-08-11 16:55:44 in reply to 4 [source]

You realize that you cannot do that or all hell will break loose upon your computer?

(7) By Richard Damon (RichardDamon) on 2022-08-11 16:58:24 in reply to 5 [link] [source]

The issue is that you are getting a massive number of hits trying to write at exactly the same time, ALL 256 processes, and without the randomization, only one succeeds at each level of delay, so unless you have a long enough timeout that it tries at least 256 times, the last one just doesn’t get in.

Unless your process might generate a similar write storm where ALL the processes want to do a write at exactly the same time, the exponential delay will generally let things succeed.

Even if all 256 processes try to write at just approximately the same time, the difference in timing will let them fill in between each other and they will succeed, as long as you aren’t pushing you write bandwidth.

It was the synchronized behavior that got you in trouble, and you might want to check that it doesn’t take more time than you allowed to process all 256 copies of the init code within you timeout period.

(8) By JakeEhrlich on 2022-08-11 17:00:07 in reply to 6 [link] [source]

Hmm I understood your first question about opening the database after forking to be in agreement with what I responded back with. Maybe a miscommunication occurred? Sorry about that, I might have said something wrong or been unclear.

To clarify I first fork, and then open the database. If that's wrong I don't understand why and would appreciate an explanation. My thinking was that opening the database and then forking would open the gates to hell. Not vice versa.

(9) By JakeEhrlich on 2022-08-11 17:13:38 in reply to 7 [link] [source]

So that's consistent with what I saw before I added randomization but now that I added the randomization I no longer see that behavior. I am still however getting sporadic SQL_LOCKED errors from initialization when I run this in practice.

(10.2) By Keith Medcalf (kmedcalf) on 2022-08-11 18:19:35 edited from 10.1 in reply to 1 [link] [source]

Here is a sample initializer. It is in Python but one should be able to glean what it is doing.

import mpsw

db = mpsw.Connection('file:/mydatabase.db?vfs=memdb')

db.executescript('pragma busy_timeout=5000')

waited = 0
while True:
    # if the database is initialized then bug out of here
    if db.execute('pragma user_version').fetchone()[0] == 42:
        break
    # database is not initialized so we need an immediate transaction
    try:
        db.beginimmediate()
    except mpsw.BusyError:
        waited += 1
        if waited > 6:
            raise
    except:
        raise
    # if we are still in autocommit (no transaction) try try again from the top
    if db.autocommit:
        continue
    # Now we have exclusive write access to the database
    # if the database is initialized then rollback the transaction and bug out
    if db.execute('pragma user_version').fetchone()[0] == 42:
        db.rollback()
        break
    # Lets do the initialization and set the indicator
    db.executescript('create ...')
    db.executescript('pragma user_version=42')
    # no excuse will be accepted for inability to commit this transaction
    # so keep trying until we are back in autocommit
    while not db.autocommit:
        try:
            db.commit()
        except:
            pass
    break
# now we know the database is initialized

Minor correction. If you determine that you need to do the initialization then after obtaining the lock your should re-read the indicator to make sure you really still need to do the initialization, and if not, then rollback the transaction that we do not need. This will eliminate the race condition between the checking and initialization, ensuring that it is only done once.

And waited needs initializing. Comments added.

(11) By Keith Medcalf (kmedcalf) on 2022-08-11 17:16:13 in reply to 8 [link] [source]

You are correct. I will lay the blame on my crappy eyesight :)

(12) By anonymous on 2022-08-11 17:27:15 in reply to 9 [link] [source]

Set the busy handler before attempting to write in any process

(13) By Richard Damon (RichardDamon) on 2022-08-11 17:35:04 in reply to 9 [link] [source]

Yes, randomizing helps a lot, but a give process only tries a limited number of times, so the odds of one of the processes hitting contention each time isn’t THAT low.

The key is to not require each process to get that exclusive lock, since it doesn’t need it if the database is already setup.

Keith’s method of a “version number” stored would be an alternate method of detecting that you don’t need to do the setup. If the version number doesn’t exist or is too low, then try to get the write lock, and if you succeed, check again and update as needed. If version number is sufficient, skip the init code.

Note, with this method you EXPECT a lot of the processes to get the BUSY code on their attempt (especially if you set the period low for the test as I suggested) but you handle it and try again instead of erroring out. The reason for setting the timeout low for THIS write is that if someone else is doing the write, you won’t actually need to get the write lock, but just see that they completed the job. This is different than later writes, where you want to make sure your write actually happens, so you allow for longer delays.

(14) By JakeEhrlich on 2022-08-11 18:06:46 in reply to 13 [link] [source]

I want to make sure we're distinguishing between SQLITE_BUSY and SQLITE_LOCKED. I'm getting SQLITE_LOCKED now not SQLITE_BUSY. I don't understand the cause of SQLITE_LOCKED as of right now. My understanding is that I would only get SQLITE_BUSY if my handler returns 0 which I no longer see happening thanks for randomization. I'm sure that avoiding write contention entirely would also help with that.

I still, after initialization, need to have these processes insert things into the database. That's going to demand an exclusive lock since the data will be unique for each process. Sure I can almost entirely avoid contention on initialization but I think that's a bit of a red hearing for my larger problem right? Is it just not possible to handle a high amount of write contention?

(15) By Richard Damon (RichardDamon) on 2022-08-11 18:23:27 in reply to 14 [link] [source]

Ok, SQLITE_LOCKED is a different sort of error, and indicates a problem on a single connection or shared cache. I would try to find what statement is getting the SQLITE_LOCKED error and figure out what that process is doing that is blocking that operation. (Maybe a transaction wasn’t finished).

Write contention is somewhat limited as only one write transaction can be in process at a time, and other requests need to wait for it to finish. More of a problem is if you start with a read transaction, and then try to upgrade it to a write, if another write has or is happening, it will fail (so read-modify-writes should start with a BEGIN IMMEDIATE to get the write lock at the start).

(16) By JakeEhrlich on 2022-08-11 18:41:38 in reply to 15 [link] [source]

So it's the exec call shown in the code above that returns the error, per my original post. All the processes are the same when I fork but I've seen this occur in practice too when I have other reads and writes happening concurrently.

I begin all write transactions with an insert so I don't believe I have the upgrade problem.

(17) By anonymous on 2022-08-11 21:29:31 in reply to 1 [link] [source]

Put the database initialisation code in a separate setup program. Wait until the setup process exits with success before allowing any regular processes to be created.