SQLite Forum

Random crash doing SELECT
Login

Random crash doing SELECT

(1.2) By zero (zeroxia) on 2021-07-22 05:58:29 edited from 1.1 [link] [source]

Hi,

My program uses 1 write connection, 3 read connections. SQLite is open as "SQLITE_OPEN_NOMUTEX".

When a thread needs to access the SQLite, it needs to get a connection by waiting on a condition variable. So effectively once a thread waited successfully, it exclusively owns that connection object.

If a thread wants the write connection, it needs to wait until no read connection is busy, and once write connection is busy, no read connection can be retrieved.

It's been running a while on MacOS, Windows and QNX without problem.

Yesterday I run the program in linux docker containers, some mysterious crash happened.

One is centos 7 container:

statement is:

"SELECT \"tile_blob\" FROM \"tiles_table\"\n WHERE \"tile_meta\" = @tile_meta AND \"tile_id\" = @tile_id;"

Call stack:

Program terminated with signal 11, Segmentation fault.
#0  0x0000000000806ad2 in sqlite3VdbeAddOp3 (p=0x7f84d00279e8, op=96, p1=0, p2=4, p3=0) at /home/jenkins/agent/workspace/nav/thirdparty/develop/linux-centos7x64/thirdparty/sqlite/sqlite-amalgamation/sqlite3.c:77795
77795   /home/jenkins/agent/workspace/nav/thirdparty/develop/linux-centos7x64/thirdparty/sqlite/sqlite-amalgamation/sqlite3.c: No such file or directory.
warning: File "/usr/local/lib64/libstdc++.so.6.0.22-gdb.py" auto-loading has been declined by your `auto-load safe-path' set to "$debugdir:$datadir/auto-load:/usr/bin/mono-gdb.py".
To enable execution of this file add
        add-auto-load-safe-path /usr/local/lib64/libstdc++.so.6.0.22-gdb.py
line to your configuration file "/root/.gdbinit".
To completely disable this security protection add
        set auto-load safe-path /
line to your configuration file "/root/.gdbinit".
For more information about this security protection see the
"Auto-loading safe path" section in the GDB manual.  E.g., run from the shell:
        info "(gdb)Auto-loading safe path"
Missing separate debuginfos, use: debuginfo-install glibc-2.17-292.el7.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.15.1-37.el7_7.2.x86_64 libcom_err-1.42.9-16.el7.x86_64 libselinux-2.5-14.1.el7.x86_64 openssl-libs-1.0.2k-19.el7.x86_64 pcre-8.32-17.el7.x86_64 zlib-1.2.7-18.el7.x86_64
(gdb) bt
#0  0x0000000000806ad2 in sqlite3VdbeAddOp3 (p=0x7f84d00279e8, op=96, p1=0, p2=4, p3=0) at /home/jenkins/agent/workspace/nav/thirdparty/develop/linux-centos7x64/thirdparty/sqlite/sqlite-amalgamation/sqlite3.c:77795
#1  0x000000000080728f in sqlite3VdbeAddOp4Int (p=0x7f84d00279e8, op=96, p1=0, p2=4, p3=0, p4=5) at /home/jenkins/agent/workspace/nav/thirdparty/develop/linux-centos7x64/thirdparty/sqlite/sqlite-amalgamation/sqlite3.c:78040
#2  0x0000000000846b94 in sqlite3OpenTable (pParse=0x7ffd1e0d6ab0, iCur=0, iDb=0, pTab=0x7f84dc016778, opcode=96) at /home/jenkins/agent/workspace/nav/thirdparty/develop/linux-centos7x64/thirdparty/sqlite/sqlite-amalgamation/sqlite3.c:120293
#3  0x000000000087877c in sqlite3WhereBegin (pParse=0x7ffd1e0d6ab0, pTabList=0x7f84d002afc8, pWhere=0x7f84d002ad48, pOrderBy=0x0, pResultSet=0x7f84d002aac8, wctrlFlags=0, iAuxArg=320)
    at /home/jenkins/agent/workspace/nav/thirdparty/develop/linux-centos7x64/thirdparty/sqlite/sqlite-amalgamation/sqlite3.c:150096
#4  0x000000000085e8f3 in sqlite3Select (pParse=0x7ffd1e0d6ab0, p=0x7f84d002af48, pDest=0x7ffd1e0d5e70) at /home/jenkins/agent/workspace/nav/thirdparty/develop/linux-centos7x64/thirdparty/sqlite/sqlite-amalgamation/sqlite3.c:135208
#5  0x00000000008803fd in yy_reduce (yypParser=0x7ffd1e0d60c0, yyruleno=82, yyLookahead=1, yyLookaheadToken=..., pParse=0x7ffd1e0d6ab0) at /home/jenkins/agent/workspace/nav/thirdparty/develop/linux-centos7x64/thirdparty/sqlite/sqlite-amalgamation/sqlite3.c:157152
#6  0x0000000000884586 in sqlite3Parser (yyp=0x7ffd1e0d60c0, yymajor=1, yyminor=...) at /home/jenkins/agent/workspace/nav/thirdparty/develop/linux-centos7x64/thirdparty/sqlite/sqlite-amalgamation/sqlite3.c:158453
#7  0x000000000088576b in sqlite3RunParser (pParse=0x7ffd1e0d6ab0, zSql=0x149f253 ";", pzErrMsg=0x7ffd1e0d6c58) at /home/jenkins/agent/workspace/nav/thirdparty/develop/linux-centos7x64/thirdparty/sqlite/sqlite-amalgamation/sqlite3.c:159732
#8  0x00000000008531e8 in sqlite3Prepare (db=0x7f84d001d0c8, zSql=0x149f1f0 "SELECT \"tile_blob\" FROM \"tiles_hd_8590\"\n    WHERE \"tile_meta\" = @tile_meta AND \"tile_id\" = @tile_id;", nBytes=101, prepFlags=128, pReprepare=0x0, ppStmt=0x7ffd1e0d6ea8, pzTail=0x0)
    at /home/jenkins/agent/workspace/nav/thirdparty/develop/linux-centos7x64/thirdparty/sqlite/sqlite-amalgamation/sqlite3.c:128665
#9  0x0000000000853410 in sqlite3LockAndPrepare (db=0x7f84d001d0c8, zSql=0x149f1f0 "SELECT \"tile_blob\" FROM \"tiles_hd_8590\"\n    WHERE \"tile_meta\" = @tile_meta AND \"tile_id\" = @tile_id;", nBytes=101, prepFlags=128, pOld=0x0, ppStmt=0x7ffd1e0d6ea8, pzTail=0x0)
    at /home/jenkins/agent/workspace/nav/thirdparty/develop/linux-centos7x64/thirdparty/sqlite/sqlite-amalgamation/sqlite3.c:128737
#10 0x00000000008535e6 in sqlite3_prepare_v2 (db=0x7f84d001d0c8, zSql=0x149f1f0 "SELECT \"tile_blob\" FROM \"tiles_hd_8590\"\n    WHERE \"tile_meta\" = @tile_meta AND \"tile_id\" = @tile_id;", nBytes=101, ppStmt=0x7ffd1e0d6ea8, pzTail=0x0)
    at /home/jenkins/agent/workspace/nav/thirdparty/develop/linux-centos7x64/thirdparty/sqlite/sqlite-amalgamation/sqlite3.c:128821

Another is Ubuntu 1804 container, statement is:

statement is

"SELECT \"tile_blob\" FROM \"tiles_table\"\n WHERE \"tile_meta\" = @tile_meta AND \"tile_id\" = @tile_id;"

``

Call stack:

[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `./ft -ns -tc=... ...'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007eff04f84b03 in sqlite3ApiExit (db=0x40000000000, rc=0) at /home/jenkins/agent/workspace/nav/thirdparty/develop/linux-ubuntu-1804/thirdparty/sqlite/sqlite-amalgamation/sqlite3.c:28090
28090   /home/jenkins/agent/workspace/nav/thirdparty/develop/linux-ubuntu-1804/thirdparty/sqlite/sqlite-amalgamation/sqlite3.c: No such file or directory.
[Current thread is 1 (Thread 0x7eff04b35900 (LWP 47426))]
(gdb) bt
#0  0x00007eff04f84b03 in sqlite3ApiExit (db=0x40000000000, rc=0) at /home/jenkins/agent/workspace/nav/thirdparty/develop/linux-ubuntu-1804/thirdparty/sqlite/sqlite-amalgamation/sqlite3.c:28090
#1  0x00007eff04fbc772 in sqlite3Step (p=0x7efee4026d78) at /home/jenkins/agent/workspace/nav/thirdparty/develop/linux-ubuntu-1804/thirdparty/sqlite/sqlite-amalgamation/sqlite3.c:83508
#2  0x00007eff04fbc913 in sqlite3_step (pStmt=0x7efee4026d78) at /home/jenkins/agent/workspace/nav/thirdparty/develop/linux-ubuntu-1804/thirdparty/sqlite/sqlite-amalgamation/sqlite3.c:83553

The "sqlite-amalgamation" used to build SQLite is:

#define SQLITE_VERSION        "3.32.3"
#define SQLITE_VERSION_NUMBER 3032003
#define SQLITE_SOURCE_ID      "2020-06-18 14:00:33 7ebdfa80be8e8e73324b8d66b3460222eb74c7e9dfd655b48d6ca7e1933cc8fd"

Is there any suggestion to help me to pinpoint the problem? Thanks.

(2) By jose isaias cabrera (jicman) on 2021-07-23 15:08:55 in reply to 1.2 [link] [source]

Hi. You haven't provided much information, so, from what I can quickly come up to debug your problem would be to try to print to a text file or the screen output what is the content of:

"SELECT "tile_blob" FROM "tiles_table"n WHERE "tile_meta" = @tile_meta AND "tile_id" = @tile_id;"

Once you get that, use the sqlite3 tool and connect to the DB and try pasting that result on the tool and see if you get a result, or if it also crashes. Other than this, I don't know how else to help you.

josé

(3) By Rico Mariani (rmariani) on 2021-07-23 17:41:25 in reply to 1.2 [source]

It's very unlikely that sqlite3VdbeAddOp3 has a bug like this. I can suggest a few things that might help:

  • first try a variety of experiments with your load on this system but without any of the concurrency going on

  • simplify your concurrency for test purposes so that it's ultra dumb, just one person totally exclusive

  • if that works, expand the concurrency options slowly and look to see where things start to go wrong

The most likely situation is that on this system you aren't getting the concurrency guarantees that you think you're getting or that you want. That will be hard to diagnose and it could cause bizarre failures that are nearly impossible to figure out. Sanity check the stuff and then add slowly for clues as to where things start going wrong.

The likelihood is that whatever is going wrong has very little to do with SQLite per se.

(4) By Keith Medcalf (kmedcalf) on 2021-07-25 03:51:10 in reply to 1.2 [link] [source]

Docker is likely the problem.

It is known to have a defective file system that does not provide the facilities necessary for SQLite3 to work in a concurrent access environment.

This is especially likely since it is "the thing that changed" which transmorgified something which was "working" into something which was not.

(5) By zero (zeroxia) on 2021-08-10 06:19:50 in reply to 4 [link] [source]

Thank you all for your replies.

Docker environment seems easier to reproduce. But on MacOS, it can also happen, after running for a long time, e.g. 2 hours or 2 days.

On QNX-700 and Windows, same testing program, not seen yet, I've run them like several days.

What can be observed:

  1. It always has bad read access (EXC_BAD_ACCESS), for address like 0x0000000100000001

  2. It always happen in read (i.e. SQL select)

  3. It always happen inside SQLite

  4. When crash happens, if you inspect each thread, it is often only one thread is running SQLite code.

  5. It usually happens at "prepare", but not always.

One thing I did for calling prepare, I include the trailing '\0' as the string buffer length to SQLite, it this wrong?

   244      template <typename T, typename = typename std::enable_if<std::is_same<T, boost::string_view>::value || std::is_same<T, std::string>::value>::type>
   245      int prepare(const T& s)
   246      {
-> 247          return sqlite3_prepare_v2(db_, s.data(), static_cast<int>(s.size()) + 1, &stmt_, nullptr);
   248      }

Is it because of the the way I use SQLite is not correct?

It is as follows:

  1. Open one "write" connection: static constexpr int OPEN_WRITE_FLAGS = SQLITE_OPEN_READWRITE | SQLITE_OPEN_CREATE | SQLITE_OPEN_NOMUTEX;

  2. Open three "write" connections: static constexpr int OPEN_READ_FLAGS = SQLITE_OPEN_READONLY | SQLITE_OPEN_NOMUTEX;

  3. If a thread wants to have write access, it waits untill all connections are idle, then it can get the write connection, and use it.

  4. If a thread wants to have read access, it waits until write connection is idle, and one of three read connection is idle, then it can get the read connection, and use it.

Today on the following environment, I run the tests:

  1. MacOS, 10.14.5
  2. Xcode 11.3.1
  3. clang version:
    Apple clang version 11.0.0 (clang-1100.0.33.17)
    Target: x86_64-apple-darwin18.6.0
    Thread model: posix
    InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin
    

I build both libsqlite and my program with address sanitizer enabled.

When crash happens, still not much helpful:


Process 3213 stopped
* thread #2, name = 'tasdk.strm.dl', stop reason = EXC_BAD_ACCESS (code=2, address=0x100000001)
    frame #0: 0x00000001017393b4 hdmap-tools`tokenExpr(pParse=0x000070000135e000, op=153, t=(z = "@tile_meta AND \"tile_id\" = @tile_id;", n = 10)) at sqlite3.c:153692:13
   153689           Expr *p = sqlite3DbMallocRawNN(pParse->db, sizeof(Expr)+t.n+1);
   153690           if( p ){
   153691             /* memset(p, 0, sizeof(Expr)); */
-> 153692             p->op = (u8)op;
   153693             p->affExpr = 0;
   153694             p->flags = EP_Leaf;
   153695             ExprClearVVAProperties(p);
Target 0: (hdmap-tools) stopped.
(lldb) p p
(Expr *) $0 = 0x0000000100000001
(lldb) p op
(int) $1 = 153
(lldb) p p->op
(u8) $2 = ''
(lldb) p p->op = 153
(u8) $3 = '\x99'
(lldb) p p->op
(u8) $4 = '\x99'
(lldb) p p
(Expr *) $5 = 0x0000000100000001
(lldb) bt
* thread #2, name = 'tasdk.strm.dl', stop reason = EXC_BAD_ACCESS (code=2, address=0x100000001)
  * frame #0: 0x00000001017393b4 hdmap-tools`tokenExpr(pParse=0x000070000135e000, op=153, t=(z = "@tile_meta AND \"tile_id\" = @tile_id;", n = 10)) at sqlite3.c:153692:13
    frame #1: 0x000000010172592a hdmap-tools`yy_reduce(yypParser=0x000070000135d150, yyruleno=176, yyLookahead=44, yyLookaheadToken=(z = "AND \"tile_id\" = @tile_id;", n = 3), pParse=0x000070000135e000) at sqlite3.c:157591:28
    frame #2: 0x0000000101719346 hdmap-tools`sqlite3Parser(yyp=0x000070000135d150, yymajor=44, yyminor=(z = "AND \"tile_id\" = @tile_id;", n = 3)) at sqlite3.c:158453:15
    frame #3: 0x00000001015c234d hdmap-tools`sqlite3RunParser(pParse=0x000070000135e000, zSql="AND \"tile_id\" = @tile_id;", pzErrMsg=0x000070000135dfe0) at sqlite3.c:159732:5
    frame #4: 0x00000001016f8727 hdmap-tools`sqlite3Prepare(db=0x0000617000010700, zSql="SELECT \"timestamp\" FROM \"tiles_hd_10228\"\n    WHERE \"tile_meta\" = @tile_meta AND \"tile_id\" = @tile_id;", nBytes=102, prepFlags=128, pReprepare=0x0000000000000000, ppStmt=0x000070000135f
118, pzTail=0x0000000000000000) at sqlite3.c:128665:5
    frame #5: 0x00000001015bd52a hdmap-tools`sqlite3LockAndPrepare(db=0x0000617000010700, zSql="SELECT \"timestamp\" FROM \"tiles_hd_10228\"\n    WHERE \"tile_meta\" = @tile_meta AND \"tile_id\" = @tile_id;", nBytes=102, prepFlags=128, pOld=0x0000000000000000, ppStmt=0x000070000135
f118, pzTail=0x0000000000000000) at sqlite3.c:128737:10
    frame #6: 0x00000001015bb7e6 hdmap-tools`sqlite3_prepare_v2(db=0x0000617000010700, zSql="SELECT \"timestamp\" FROM \"tiles_hd_10228\"\n    WHERE \"tile_meta\" = @tile_meta AND \"tile_id\" = @tile_id;", nBytes=102, ppStmt=0x000070000135f118, pzTail=0x0000000000000000) at sqlite3
.c:128821:8
    frame #7: 0x0000000100d07d36 hdmap-tools`int SqliteStatement::prepare<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, void>(this=0x000070000135f110, s="SELECT \"timestamp\" FROM \"tiles_hd_10228\"\n    WHERE \"tile_m
eta\" = @tile_meta AND \"tile_id\" = @tile_id;") at sqlite_manager.h:247:16

At this moment, only one thread is executing SQLite code:

(lldb) thread list
Process 3213 stopped
  thread #1: tid = 0x5bc4, 0x00000001095d1deb libclang_rt.asan_osx_dynamic.dylib`__asan_memcpy + 715, queue = 'com.apple.main-thread'
* thread #2: tid = 0x5bd1, 0x00000001017393b4 hdmap-tools`tokenExpr(pParse=0x000070000135e000, op=153, t=(z = "@tile_meta AND \"tile_id\" = @tile_id;", n = 10)) at sqlite3.c:153692:13, name = 'tasdk.strm.dl', stop reason = EXC_BAD_ACCESS (code=2, address=0x100000001)
  thread #3: tid = 0x5bd0, 0x00007fff64e2d61a libsystem_kernel.dylib`__select + 10, name = 'tasdk.httpclient'
  thread #4: tid = 0x5bd2, 0x00007fff64e2986a libsystem_kernel.dylib`__psynch_cvwait + 10, name = 'tasdk.strm.stat'
  thread #5: tid = 0x5bda, 0x00007fff64e27bfe libsystem_kernel.dylib`__workq_kernreturn + 10

I'll change the read connection count from 3 to 1 to test it.

(6.1) By zero (zeroxia) on 2021-08-16 08:10:04 edited from 6.0 in reply to 4 [link] [source]

Deleted

(7) By Rowan Worth (sqweek) on 2021-08-17 03:01:07 in reply to 5 [link] [source]

It sounds like you are sharing connections between threads, in which case using SQLITE_OPEN_NOMUTEX is an especially bad idea. Even if you think you can prove that only one thread is using a particular connection at a time, the mutex might be the only thing enforcing a memory barrier.

(8) By zero (zeroxia) on 2021-08-19 08:26:29 in reply to 7 [link] [source]

I believe that is the cause!

The SQLite documentation is kind of obscure when comes to concurrent usage.

I've done a lot of experiments and just realize recently that I might just change the open flag from "SQLITE_OPEN_NOMUTEX" to "SQLITE_OPEN_FULLMUTEX".

Today I did the change and re-compiled the program, no crash now.

Citing the documentation of flag "SQLITE_OPEN_NOMUTEX":

The new database connection will use the "multi-thread" threading mode. This means that separate threads are allowed to use SQLite at the same time, as long as each thread is using a different database connection.

I just remembered that "NOMUTEX" can be used to run SQLite in "multi-thread" mode (the names are really confusing).

And the last sentence says "as long as each thread is using a different database connection", I don't think this means this: each thread should have its own exclusive connection. So it leads me to suppose as long as only one thread is using one connection at a time, it's OK.

It turns out that if one connection is used by thread 1, after that, then it's used by thread 2, then there is problem.

This can be observed from my experiment, I set the open read connection to 1, but with multiple threads using the same connection to do reading, although my application has mutex to synchronize the using of such connection, the crash would always happen.

There is also a question on stackoverflow confirming this:

https://stackoverflow.com/questions/10079552/what-do-the-mutex-and-cache-sqlite3-open-v2-flags-mean

From the answer:

SQLITE_OPEN_NOMUTEX does compile (or active on sqlite3_open_v2), the usage of SQLite without internal mutex. Still safe to be use SQLite multithread BUT in my experience work only safely with individual connection open in each thread. Do NOT try to share a connection between Thread with this mode.

I hope SQLite can be a little bit more verbose and precise on documentation and that could help prevent people from learning it the hard way...

(9) By Keith Medcalf (kmedcalf) on 2021-08-19 18:18:59 in reply to 8 [link] [source]

The SQLite documentation is kind of obscure when comes to concurrent usage.

In what what way?

It is quite clear that a connection is serially entrant on multiple threads, and that you cannot enter sqlite3 using the same connection on separate threads simultaneously.

It is also clear that in order to ensure these requirements are followed there is a specific configuration option for your use to help YOU comply with the requirements. Not you, God (she's black you know), nor anyone else can CHANGE the requirements, but you can change how YOU will comply with them.

Method 1: USE ONLY A SINGLE THREAD

Configure SQLite3 for "single thread" operation. This removes all protections and capability to use SQLite3 with multiple threads by removing or disabling ALL shared data MUTEXes.

Method 2: USE MULTIPLE THREADS, BELTS, and SUSPENDERS

This is the default configuration of SQLite3. You are permitted to use SQLite3 with multiple threads. The requirements are unchanged. SQLite3 will ENSURE that YOU behave in accordance with the requirements for serial entry by actively managing a set of MUTEX semaphores to PREVENT YOU from violating the requirements. This is the default mode. SERIALIZED. FULLMUTEX.

Method 3: USE MULTIPLE THREADS WITHOUT BELT AND SUSPENDERS

Configure SQLite3 for MULTITHREAD operation or open a connection with NOMUTEX. YOU must ensure that you comply with the requirements or all hell will break loose (which may entail a wide range of behaviours ranging from merely strange corruption issues to crashes all the way up to and including the destruction of the multiverse).

Method 1 is the default. It will ALWAYS work properly. You may save a few nanoseconds per entry into the SQLite3 code by specifying one of the other methods IF AND ONLY IF THEY APPLY. Under no circumstance whatsoever may YOU violate the requirements.

Basically, if something does not work properly when using some configuration OTHER THAN the default SERIALIZED / FULLMUTEX, but works properly when using the default configuration, that means that YOU are not complying with the entry requirements YOU have stated you will comply with.

(10) By Larry Brasfield (larrybr) on 2021-08-19 19:18:09 in reply to 8 [link] [source]

I hope SQLite can be a little bit more verbose and precise on documentation and that could help prevent people from learning it the hard way...

We make every reasonable effort to enable those who read and heed the docs to avoid being schooled by hard knocks.

My problem with your plaintive hope is that I do not see how to (attempt to) block your confusion without simply putting some of what is already said in bold or repeating it in the "pay attention here:"a section.

[on SQLITE_OPEN_NOMUTEX effect] And the last sentence says "as long as each thread is using a different database connection", I don't think this means this: each thread should have its own exclusive connection. So it leads me to suppose as long as only one thread is using one connection at a time, it's OK.

It would be ok, if "only one thread is using one connection at a time" means that reentrant execution is blocked by means external to the SQLite library. But the fact that you see a functional change between using "SQLITE_OPEN_NOMUTEX" and "SQLITE_OPEN_FULLMUTEX" means your code did not actually manage to avoid the reentrancy. I do not see any documentation deficiency here; just a failure to adhere to the requirements you correctly read or inferred from the documentation.


a. There is no such section. Having one would be akin to putting something like "These requirements shall be met." into a specification.

(11) By Keith Medcalf (kmedcalf) on 2021-08-19 19:39:41 in reply to 9 [link] [source]

(12) By zero (zeroxia) on 2021-08-23 06:18:51 in reply to 10 [link] [source]

I'm still confused by your extensive response.

Suppose the application is following:

  1. 2 threads: T1, T2
  2. One mutex: M.
  3. Opened one connection to an SQLite database: C, with flag SQLITE_OPEN_NOMUTEX.

Now for a thread to access the database, it needs to:

  1. Acquire lock on M.
  2. Execute SQL on C: the execution is always BEGIN TRANSACTION; and then do some stuff, and finally COMMIT.
  3. Release lock on M.

In this situation, random crash could happen on some of my target platforms, like Linux CentOS 7 and MacOS 10.14. If I inspect the thread stacks, it's only one thread inside SQLite code where BAD ACCESS happens, the rest threads are usually just sleeping. (Strangely on Windows and QNX, even I run the program like 2 days, no crash happened yet.)

My conclusion from my experience is that SQLITE_OPEN_NOMUTEX does not support this, because this scenario means both T1 and T2 would access C, although there is no concurrent accessing to C because application protects it with M.

Your reply suggests this scenario IS supported, and the crash is probably due to my code violating the concurrent requirement.

I suppose the requirement is that to use SQLITE_OPEN_NOMUTEX, both T1 and T2 needs to open their own connection (e.g. C1 and C2) to the database, and a thread needs to stick to using its own connection.

(13) By Keith Medcalf (kmedcalf) on 2021-08-23 07:35:27 in reply to 12 [link] [source]

My conclusion from my experience is that SQLITE_OPEN_NOMUTEX does not support this, because this scenario means both T1 and T2 would access C, although there is no concurrent accessing to C because application protects it with M.

My conclusion is that your code is broken.

If you use SQLITE_OPEN_FULLMUTEX and the crashes go away, then the problem is isolated to your re-implementation of SQLITE_OPEN_FULLMUTEX not being sufficient and/or not working as you think it does.

If you use SQLITE_OPEN_FULLMUTEX and the crashes do not go away, that would indicate that you either found a bug (which needs to be fixed) or that your code is tromping on shit it owns not and causing the sewers to back up.

(14) By Larry Brasfield (larrybr) on 2021-08-23 11:39:05 in reply to 12 [link] [source]

Your reply suggests this scenario IS supported, and the crash is probably due to my code violating the concurrent requirement.

Exactly. I can state it no better than Keith's "If you use SQLITE_OPEN_FULLMUTEX and the crashes go away, then the problem is isolated to your re-implementation of SQLITE_OPEN_FULLMUTEX".

I suppose the requirement is that to use SQLITE_OPEN_NOMUTEX, both T1 and T2 needs to open their own connection (e.g. C1 and C2) to the database, and a thread needs to stick to using its own connection.

That's not precisely the requirement. Different threads may use a shared connection IF AND ONLY IF they do not execute (most) SQLite code concurrently. Without the protection against reentrancy effected when the library guards against such using a mutex, prevention of conflicting access to its data structures (including the DB itself) is up to the client code.

The reason that SQLITE_OPEN_NOMUTEX exists is that multithreaded applications often have their own guards against deleterious reentrancy for their own data structures which may include something like a SQLite connection. In such cases, fiddling with a (likely additional) mutex would be superfluous and a waste of code space and time.

(15) By J.M. Aranda (JMAranda) on 2021-08-23 12:54:53 in reply to 1.2 [link] [source]

Remember guys. Theads are Evil. We all know it.
There is nothing 100x100 safe against that.
But we continue to use them.
Those who repeat exactly the same thing hoping for a different result are very loved by the Gods.