DB read fails while doing parallel bulk DB updates

(1) By mk_cs (karthik.libran) on 2020-05-03 18:33:13 [link] [source]

Our application has 2 threads, thread 1 opens the DB handle with read/write option and thread 2 opens the DB with read only option. sqlite library is compiled with  -DSQLITE_THREADSAFE=2 option

Thread 1 takes care of DB updates (records update can be 200K to 500K updates). In order to have optimal user experiance we have used begin/end transaction to do bulk updates as shown below
    1. Begin transaction 
    2. In a loop insert/update 250 records 
    3. End Transaction
    4. used thread yield, sched_yield(); 
    5. Start from step 1 (we dint want to lock the DB with single begin and end transaction)

Thread 2 is just used to query DB when a HW event occurs and its critical for us to make sure the query is serviced successfully on ALL occasions within 1 second.

While we are doing bulk updates to DB (using thread 1) if we issue select query using thread 2 it passes only 2% of the time, our expectation is since we have separate handles and we have used THREADSAFE=2 option DB read should pass all the time while the other thread is updating the DB. 

If we use a select count query in step 4 of thread 1 then the pass rate of select query 
in thread 2 increases exponentially to 95%.  We are not exactly sure what's going wrong.

How to achieve consistent DB read (response time < 1sec) while bulk DB updates are going on in parallel

(2) By Simon Slavin (slavin) on 2020-05-03 18:55:56 in reply to 1 [link] [source]

sched_yield() is designed to handle concurrent threads. It does not understand database access conflicts.

Have you set a timeout on every connection to the database ?

https://sqlite.org/c3ref/busy_timeout.html

Set them all to 10 seconds and see what happens.

(4) By mk_cs (karthik.libran) on 2020-05-03 19:35:13 in reply to 2 [link] [source]

Just tried sched_yield() to see if my thread 2 gets priority and gets executed successfully, we did try timeout callback but that failed repeatedly for upto 2 seconds. We did not try 10 seconds. SInce we want to execute the select query within 1 second we stopped at 2 seconds.

(3.1) By Keith Medcalf (kmedcalf) on 2020-05-03 19:08:02 edited from 3.0 in reply to 1 [link] [source]

While we are doing bulk updates to DB (using thread 1) if we issue select query using thread 2 it passes only 2% of the time, our expectation is since we have separate handles and we have used THREADSAFE=2 option DB read should pass all the time while the other thread is updating the DB.

Why do you believe that, and what is the error returned?

WAL https://www.sqlite.org/wal.html is what permits reading while writing (but not during checkpointing).

THREADSAFE controls how the single-entrance requirement is to be policed. The default THREADSAFE=1 means that SQLite3 will use mutexes on connections to ensure that single-entrance is enforced, thus taking a few extra nanoseconds on each library entry but preventing all hell from breaking loose. THREADSAFE=2 means that SQLite3 will not use mutexes to ensure that single-entrance requirements are met thus saving those few nanoseconds per library call but will not prevent all hell from breaking loose if you do not properly adhere to the entrance requirements.

THREADSAFE does not affect locking or concurrency.

So, there are a couple of possibilities here:

Your cache_size is too small
Your update is taking too long

In any case, your scheduler yield will do nothing since nothing is waiting on a scheduler yield (this is pre-emptive multitasking, not co-operating multitasking).

If you are sure your cache_size is ample for the changeset then you might want to try compiling with SQLITE_ENABLE_SETLK_TIMEOUT=1 if your system supports file control of lock timeouts. It may or may not solve your problem by changing the locking operations to block on the lock rather than block on a timer to re-poll the lock.

Otherwise (after making sure your cache_size is big enough) use WAL mode (pragma journal_mode=WAL) and set the pragma wal_autocheckpoint=1 (to ensure maximum determinism).

(5) By Keith Medcalf (kmedcalf) on 2020-05-03 19:48:26 in reply to 3.1 [source]

Note that you MUST have a busy timeout set for any of this to work at all.

(6) By mk_cs (karthik.libran) on 2020-05-04 19:19:51 in reply to 5 [link] [source]

Hi Keith

Thanks a lot for the quick and detailed response. Based on your response i tried couple of things. Using WAL mode i was able to get the read in thread 2 to always pass with thread 1 doing DB updates. Then i tried SQLITE_CONFIG_SERIALIZED with busy call back function set that also seems to solve the problem. Sometimes the read query goes for retry upto 2 seconds trying to optimize that now. So out of the two solutions i beleive second is the best one, please let me know your thoughts.

(7.1) By Keith Medcalf (kmedcalf) on 2020-05-04 21:39:28 edited from 7.0 in reply to 6 [link] [source]

This is good to hear. Each of these has a few caveats and there are some relevant assumptions that have to do with the environment in which the application is running and the particular resources it has available.

Firstly, I would recommend leaving SQLITE_CONFIG_SERIALIZED as the default unless you find that it is causing a problem -- and that would likely manifest as say "high CPU usage" or excessive mutex blocking. There is no point in going to all the trouble of trying to use a different serialization mode if it will not provide a benefit.

Secondly, WAL mode provides a trade-off between concurrency and I/O. In particular it increases concurrency (reading while writing) at the cost of increasing the amount of I/O (more or less double the I/O in order to provide the concurrency). In effect this means that your I/O will saturate in half the time it would otherwise, or that if it is already saturated, then it will become "half-as-fast" at it was before. Depending on where your database is stored this could make a big difference -- for example if it is stored on a slow USB NAND device then the effect will be quite noticeable, if the database is stored on NVMe SSD with 4.2 Gbit throughput, not so much. This is because the concurrency is providing by using a "write-aside" log so as not to interrupt reading, however, you will periodically have to "checkpoint" this write-aside buffer back into the actual database file which is an increase in total I/O, not a reduction. The theory is that this "checkpointing" will occur more quickly resulting in a net gain in throughput, but still has a cost of increased I/O as long as the I/O channel is not full.

Thirdly, when NOT using WAL, read operations will busy-wait while database writes are occurring -- that does not mean during the BEGIN/UPDATE/INSERT (assuming that the pragma cache_size is big enough to hold all the changed pages) -- but when the transaction is being actually "COMMIT"ed to the database. Similarly the "COMMIT" will busy-wait until the READ has completed. Therefore, transactions in which you know you will be updating the database should be started with "BEGIN IMMEDIATE" rather than a plain BEGIN, so that the database knows that this transaction is intending to update the database.

Fourthly, the busy handler itself is a polling loop, not a wait. That is, when the "reader" tries to get access to the database it may be denied because the something is "writing" to it (a COMMIT for a not-WAL database, or a checkpoint of a WAL database). When this occurs the busy-handler does not block waiting to acquire the READ lock, rather it enters a polling loop where it sleeps for a time, then checks again for the lock, lather/rinse/repeat until the lock can be acquired. Normally this works OK, but sometimes the particular environment does not support usleep (or does not report properly that it supports usleep) and so the sleep occurs in increments of seconds rather than milliseconds. (This is rare but if your instrumentation is indicating that the sleeping/retry is always occurring for an integral number of seconds, you may be experiencing this).

On some Operating Systems, the default busy-handler can implement "blocking" rather that "polling" through file_control calls. This can be enabled by compiling the sqlite3.c code with the SQLITE_ENABLE_SETLK_TIMEOUT preprocessor symbol defined. This attempts to change the busyhandler loop to use "block waiting to acquire the lock" loops rather than "poll and sleep" loops. It may or may not work on your OS, but it may solve the issue by passing the blocking to the OS scheduler rather than using a polling loop. (In which case a call to yield the current timeslice to the scheduler will work as you would expect because the blocked thread will immediately become unblocked and dispatchable).

This raises another question. You say that the "reading" operation is triggered by an incoming hardware event, though you do not say how long the response process lasts, only that it has response requirements.

Have you considered whether it would be possible to run this as a single database loop like follows:

while True
  begin immediate
  do updates/inserts/whatever batch
  commit
  while events > 0:
    events--
    process event
  end
end

and have the HW event handler merely increment the events counter? If the time taken to "process event" plus the time taken to process the "begin...commit" block were less that a second, you would have met your response requirement and there would not be a possibility of a processing of events taking too long (and you could not have database contention).

(8) By mk_cs (karthik.libran) on 2020-05-05 19:25:53 in reply to 7.1 [link] [source]

Hi Keith

We are running the application under embedded Linux OS 3.10.17 in iMx dual lite with eMMC storage, so our writes are not as fast as SSD but i haven’t seen bad performance with WAL mode during 500K inserts/update (this is our max supported records at this point of time). The response event runs for few milliseconds we just have to check bunch of conditions and activate a GPIO, I have been doing some stress test and observed a new issue

Configuration:
In default journal or WAL mode with SQLITE_CONFIG_SERIALIZED, single open handle to DB and busy callback function configured with timeout value 2 seconds (sqlite3_busy_timeout(db_,2000) )

Issue:
During parallel write from 2 different threads (Thread 1 doing bulk insert/update as mentioned in my above posts (30 to 40K records) and thread 2 doing minor updates (10 to 15 records) to different table in the same DB file) i can see busy handler getting invoked multiple times and eventually both write passes and everything ends well, but when i do a bulk delete from thread 1 (32 to 40 K records by giving a range in a single delete query this will run for 15 to 20 seconds) and minor updates from thread 2 the busy callback handler gets invoked for the same query numerous time beyond the 2 seconds cutoff (it goes close to 13 to 15 seconds, and sometimes this leads to abrupt application crash. I saw you mentioned about usleep issue. We do use usleep in our code and it works well hence i know the function call works but i’m not sure about the reporting part. 

Is there a chance that the callback handler can get into infinite loop if the usleep doesn’t work properly? in that case the best way to get out of the callback is to check the count and have some condition and exit the busy callback handler? Another option i could think of is avoid running such single big delete query instead do the bulk delete in terms of batches of 250 within a begin and end transaction and that might give chance to the second thread.

(9) By anonymous on 2020-05-05 20:46:49 in reply to 8 [link] [source]

Are these updates in the different threads going to neighboring regions in the database? i.e. same table, close by records?

I am asking because you might be interested in checking out the begin-concurrent branch, which should allow both threads to write concurrently (commits are still serialized) iff the the concurrent write transactions are not touching the same pages.

Also regarding the write amplification of the WAL, please also note that in the conventional journal mode a journal gets written for each transaction and depending on how much data you write in each transaction you could potentially end up doing even more writes than in the WAL case.

(10) By mk_cs (karthik.libran) on 2020-05-06 14:46:29 in reply to 9 [link] [source]

The updates are going to different tables within same DB file, here is a more detailed picture on the issue

To make the bulk update and inserts efficient we drop index before write and create indexes after the write operation. Creating indexes takes about 10 to 15 seconds depending on the total record count. During this time we are not able to do any DB updates, DB seems to be locked for the whole duration. Please note that Index is getting created for Table 1 whereas we are doing 3 to 5 update queries on Table 2 which fails. Is there a way to run update queries while creating indexes? (on different tables within same DB file)

(11) By Richard Hipp (drh) on 2020-05-06 16:11:23 in reply to 10 [link] [source]

Is there a way to run update queries while creating indexes?

You can put each table in a separate database file.

Threads that need to see both tables can open the first database file, then ATTACH the second. You can still do JOINs across multiple tables, even if they are in separate database files. But if two database files are involved, you can be doing a CREATE INDEX on one, while doing a simultaneous UPDATE on the other.

You can try that. But in your case, I'm not certain if it will help. It might be that the limiting factor for you is not the number of parallel threads, but rather write bandwidth of your eMMC storage. You can create all the parallel threads you want, but that is not going to make your eMMC go any faster. It is still, worth a try, though.