Multi-threaded non-serialization

(1) By NoahV (nvrudny) on 2020-10-21 03:42:37 [link] [source]

I figured I had better ask before I goof up my project too badly. I have run into an interesting problem relating to concurrency in a multi-threaded application and I wanted to make sure I am not missing a key detail.

Let me illustrate my setup: I have multiple threads running concurrently performing simple SELECTs on a single table.

If I config for multi-threaded, setup one connection per thread as read-only with no mutex, and the default private cache, I see something like this as a result.

|
|----|__(query exec)__|  
|
|----|__(query exec)__________________|  
|
|----|__(query exec)__________________|  
|

The queries all begin concurrently, but only the first thread completes in the time that I expect it to. All subsequent threads/connections take orders of magnitude longer to complete (eg. expecting ~250 us, and subsequent threads are ~2-3 ms)

After pouring over the docs, especially everything related to isolation, I enabled shared cache and enabled "PRAGMA read_uncommitted". (still running multi-threaded, one connection per thread, with no mutex)

|
|----|__(query exec)__|  
|
|---------------------|__(query exec)__|  
|
|--------------------------------------|__(query exec)__|  
|

Now the query execution time is short as expected and the same across threads, 
however it is serialized. Reading through the docs:
"sharedcache -> 2.3. Schema (sqlite_schema) Level Locking", I would guess that I am hitting the schema read-lock and I am not able to turn that off even though I am performing read-only operations exclusively during this time.

So finally my question: Is their any mode, combination of options, compile-time variable that will disable the schema lock and allow non-serialized multi-thread execution? Especially since I can guarantee that all queries are read-only during this time.

Bonus question: Why when using a private cache do subsequent threads have such a large increase in execution time? I assume, without really digging through code, that the private cache is being rebuilt for some reason on those threads. Could I determine this, and possibly force each thread to build cache once?

I appreciate any insight.

(2) By Gunter Hick (gunter_hick) on 2020-10-21 06:55:52 in reply to 1 [link] [source]

Note that setting the NOMUTEX flag enables "multi thread" mode, which allows concurrent access by multiple threads, each using their own connection.

The pattern you describe for your first attempt is consistent with running into a database lock and a busy handler sleeping for a bit (millisecond range) before trying again. You should not be running into a database lock for concurrent read transactions. Could you be inadvertently setting up write transactions instead?

The pattern you describe for your second attempt should also allow concurrent reads, but is consistent with write transactions. Especially as read_uncommitted is supposed to turn off read locks, but your threads are obviously blocked.

The increase is not "execution time" but rather "wait time" due to ms granularity of the default busy handler, activated by setting a busy timeout.

(3) By Keith Medcalf (kmedcalf) on 2020-10-21 08:46:08 in reply to 2 [link] [source]

Note that setting the NOMUTEX flag enables "multi thread" mode, which allows concurrent access by multiple threads, each using their own connection.

This is incorrect. Unless you have set single-thread mode you can use multiple threads. You can always use one connection per thread without incurring serialized connection access waits except in single-thread mode.

There are two levels of mutex protection.

One set of mutexes protect the "connection" related data (which includes statements, since they are children of a connection) from multiple simultaneous access by multiple threads.

A second set of mutexes protect the "global data" which may be accessed by all connections (not connection specific).

Setting "multi-thread" / NOMUTEX disables the mutexes on the connection.
Setting "single-thread" disables all mutexes.

Disabling mutexes simply saves the few nanoseconds that it takes to check the mutex ownership, the associated kernel transitions, and (possibly) running kernel dispatcher.

(4.1) By Gunter Hick (gunter_hick) on 2020-10-21 09:48:20 edited from 4.0 in reply to 3 [link] [source]

My point was that NOMUTEX does not disable all mutices, even though it is called NO MUTEX.

And that setting NOMUTEX is safe only if you never access any given connection from more than one thread at any given time, which condition my be fulfilled by having a dedicated connection for each thread.

(6) By NoahV (nvrudny) on 2020-10-21 14:10:58 in reply to 2 [link] [source]

Yes, I absolutely agree my first attempt should have worked. However I see this behavior even with the simplest read query ( SELECT value FROM table WHERE id=1 ). Running that exact query concurrently exhibits this behavior.

My table is indexed, could that be triggering a write in some way? Perhaps there are some other statistics that need to be turned off, such as "last row id" other usage relating to the previous query?

And yes, thank you for clarifying that it is a "wait time". I was wondering why the increase was so consistent. It looks like I can set that sleep value to be zero, but I will most likely just end up with a pattern in my second attempt.

(5) By Dan Kennedy (dan) on 2020-10-21 11:33:14 in reply to 1 [link] [source]

If I config for multi-threaded, setup one connection per thread as read-only with no mutex, and the default private cache, I see something like this as a result.

That should work for two or three threads at least.

Don't use shared-cache mode. Shared-cache mode allows less concurrency than private-cache mode.

Another thing to do is to turn off memory statistics, either by building with -DSQLITE_DEFAULT_MEMSTATUS=0 or using "sqlite3_config(SQLITE_CONFIG_MEMSTATUS, 0);" at run time.

(7) By NoahV (nvrudny) on 2020-10-21 14:14:04 in reply to 5 [link] [source]

Thanks, yes I have disabled the mem-status. Are there any other statistics that can be disabled, either at compile time or run-time? I'm trying everything I can to squeeze down this timing as much as possible.

(8) By anonymous on 2020-10-22 12:19:00 in reply to 7 [link] [source]

Have you tried running the queries from different processes? In my testing these did scale pretty well. I was using mmap and was seeing good scaling up to 16 single threaded processes that were all querying the database none stop. That was done on an anemic VM and my test db was quite larger than RAM so probably could scale further if I had access to more cpu power.

(9) By NoahV (nvrudny) on 2020-10-22 16:41:47 in reply to 1 [link] [source]

After much head banging, I found it! Thanks to all for the hints, it is a File lock issue.

This is from the amalgamation: in the function "winLock" (since I am in a windows environment) about line 44954 there is this bit of code

/* Lock the PENDING_LOCK byte if we need to acquire a PENDING lock or
  ** a SHARED lock.  If we are acquiring a SHARED lock, the acquisition of
  ** the PENDING_LOCK byte is temporary.
  */
  newLocktype = pFile->locktype;
  if( pFile->locktype==NO_LOCK
   || (locktype==EXCLUSIVE_LOCK && pFile->locktype<=RESERVED_LOCK)
  ){
    int cnt = 3;
    while( cnt-->0 && (res = winLockFile(&pFile->h, SQLITE_LOCKFILE_FLAGS,
                                         PENDING_BYTE, 0, 1, 0))==0 ){
      /* Try 3 times to get the pending lock.  This is needed to work
      ** around problems caused by indexing and/or anti-virus software on
      ** Windows systems.
      ** If you are using this code as a model for alternative VFSes, do not
      ** copy this retry logic.  It is a hack intended for Windows only.
      */
      lastErrno = osGetLastError();
      OSTRACE(("LOCK-PENDING-FAIL file=%p, count=%d, result=%d\n",
               pFile->h, cnt, res));
      if( lastErrno==ERROR_INVALID_HANDLE ){
        pFile->lastErrno = lastErrno;
        rc = SQLITE_IOERR_LOCK;
        OSTRACE(("LOCK-FAIL file=%p, count=%d, rc=%s\n",
                 pFile->h, cnt, sqlite3ErrName(rc)));
        return rc;
      }
      if( cnt ) sqlite3_win32_sleep(1);
    }

If two or more threads attempt to set the "PENDING" bit at the same time, obviously all but the first will fail. That in turn hits that sleep call at the end, which is where the consistent 1-2ms timing issue was showing up.

So.... I would challenge the design of this area. Why, when we only want to obtain a shared lock are we setting the pending bit, then clearing it a few steps later??

For my purposes, I am changing the condition to: if( locktype==EXCLUSIVE_LOCK && pFile->locktype<=RESERVED_LOCK )
That way, ONLY when I am intending to write to the file, will I request a pending lock. It works for me. I am by no means an expert on this library, but I would put a little more thought into the design here, or at the very least decrease the sleep into sub-millisecond timing.

(10) By Dan Kennedy (dan) on 2020-10-22 17:01:34 in reply to 9 [link] [source]

You could also avoid that function by using wal mode.

(11) By Max (Maxulite) on 2020-10-27 10:08:11 in reply to 9 [source]

This temporal pending in case of reading claim is explained in the docs ("No new SHARED locks are permitted against the database if a PENDING lock is active, though existing SHARED locks are allowed to continue. "). So if a reader can termporary aquire a PENDING lock, then it can be granted SHARED one. So this trying is necessary, I'd recommend not to modify it if you have plans for complex reading/writing logic.

But by implementing this hack, the developers probably focused on the Window-specific obscacles (anti-virus and indexing), but not on the concurrent lock requests as in your case. If the developer might invest some time in this windows-only workaround, it can be remade using event waiting functions with 2-3 ms time out. I doubt it will have high priority so in your case you might as well invest your time in a little modification to winlock* family of functions to support lpOverlapped.hEvent