SQLite Forum

sqlite3BtreeHoldsMutex: Assertion error
Login

sqlite3BtreeHoldsMutex: Assertion error

(1) By Burak Arslan (burakarslan) on 2020-09-10 23:38:50 [link]

Hello,

We are trying to narrow down a strange behaviour that involves temp triggers on non-temp tables. We have a couple of multi-threaded tests that is supposed to hunt down race conditions.

In our codebase, write access is regulated by a global static mutex that ensures that at most one thread has write access to the db at any given time. We rely on sqlite to figure out concurrent read access. We are using the wal mode.

One of the tests contains a CREATE TEMP TRIGGER statement of the following form:

```
CREATE TEMP TRIGGER some_func_i AFTER INSERT ON some_nontemp_table FOR EACH ROW
BEGIN
    select some_func();
END
```

where some_func is a custom function registered prior using sqlite3_create_function

When this statement is enabled;

1. The following happens with sqlite 3.31.1:

    ```
    test_message_mt: sqlite3.c:63952: sqlite3BtreeHoldsMutex: Assertion `p->sharable==0 || p->locked==0 || sqlite3_mutex_held(p->pBt->mutex)' failed.
    ```

    Here's the full backtrace: 

    ```
    1  __GI_raise              raise.c:50
    2  __GI_abort              abort.c:79
    3  __assert_fail_base      assert.c:92
    4  __GI___assert_fail      assert.c:101
    5  sqlite3BtreeHoldsMutex  sqlite3.c:63952
    6  sqlite3Prepare          sqlite3.c:127552
    7  sqlite3LockAndPrepare   sqlite3.c:127658
    8  sqlite3Prepare16        sqlite3.c:127807
    9  sqlite3_prepare16_v2    sqlite3.c:127853

    (...) 

    29 main
    ```

    This is pretty consistent, always failing at the same place.

2. Random segfaults happen with 3.32.1 in random places. Even with -fsanitize=address, the results are fairly inconsistent. Here are some of the stack traces we got:

    ```
    AddressSanitizer:DEADLYSIGNAL
    =================================================================
    AddressSanitizer:DEADLYSIGNAL
    ==24035==ERROR: AddressSanitizer: SEGV on unknown address 0x620064616572 (pc 0x7f0e29e94161 bp 0x000000000003 sp 0x7f0e04fd5b60 T19)
    ==24035==The signal is caused by a READ memory access.
        #0 0x7f0e29e94160 in sqlite3DbMallocRawNN /usr/src/debug/dev-db/sqlite-3.32.3-r1/sqlite-src-3320300-abi_x86_64.amd64/sqlite3.c:27877
        #1 0x7f0e29e97db2 in sqlite3KeyInfoAlloc /usr/src/debug/dev-db/sqlite-3.32.3-r1/sqlite-src-3320300-abi_x86_64.amd64/sqlite3.c:130294
        #2 0x7f0e29ebd223 in sqlite3KeyInfoOfIndex /usr/src/debug/dev-db/sqlite-3.32.3-r1/sqlite-src-3320300-abi_x86_64.amd64/sqlite3.c:115182
        #3 0x7f0e29ebd2d1 in sqlite3VdbeSetP4KeyInfo /usr/src/debug/dev-db/sqlite-3.32.3-r1/sqlite-src-3320300-abi_x86_64.amd64/sqlite3.c:78950
        #4 0x7f0e29ebdb5a in sqlite3OpenTableAndIndices /usr/src/debug/dev-db/sqlite-3.32.3-r1/sqlite-src-3320300-abi_x86_64.amd64/sqlite3.c:122721
        #5 0x7f0e29f24884 in sqlite3Insert /usr/src/debug/dev-db/sqlite-3.32.3-r1/sqlite-src-3320300-abi_x86_64.amd64/sqlite3.c:121255
        #6 0x7f0e29f29317 in yy_reduce /usr/src/debug/dev-db/sqlite-3.32.3-r1/sqlite-src-3320300-abi_x86_64.amd64/sqlite3.c:157563
        #7 0x7f0e29f2a7fc in sqlite3Parser /usr/src/debug/dev-db/sqlite-3.32.3-r1/sqlite-src-3320300-abi_x86_64.amd64/sqlite3.c:158501
        #8 0x7f0e29f2aa99 in sqlite3RunParser /usr/src/debug/dev-db/sqlite-3.32.3-r1/sqlite-src-3320300-abi_x86_64.amd64/sqlite3.c:159780
        #9 0x7f0e29f2ae6e in sqlite3Prepare /usr/src/debug/dev-db/sqlite-3.32.3-r1/sqlite-src-3320300-abi_x86_64.amd64/sqlite3.c:128705
        #10 0x7f0e29f2afef in sqlite3LockAndPrepare /usr/src/debug/dev-db/sqlite-3.32.3-r1/sqlite-src-3320300-abi_x86_64.amd64/sqlite3.c:128777
        #11 0x7f0e29f2cfa0 in sqlite3Prepare16 /usr/src/debug/dev-db/sqlite-3.32.3-r1/sqlite-src-3320300-abi_x86_64.amd64/sqlite3.c:128926
        #12 0x7f0e29f2d074 in sqlite3_prepare16_v2 /usr/src/debug/dev-db/sqlite-3.32.3-r1/sqlite-src-3320300-abi_x86_64.amd64/sqlite3.c:128972

        (...)

        #25 0x7f0e281e6ef9 in start_thread (/lib64/libpthread.so.0+0x7ef9)
        #26 0x7f0e26730cfe in clone (/lib64/libc.so.6+0xf9cfe)

    AddressSanitizer can not provide additional info.
    SUMMARY: AddressSanitizer: SEGV /usr/src/debug/dev-db/sqlite-3.32.3-r1/sqlite-src-3320300-abi_x86_64.amd64/sqlite3.c:27877 in sqlite3DbMallocRawNN

    ==24035==ABORTING
    ```

    No segfault but test fails, process exits with following:

    ```
    =================================================================
    ==29737==ERROR: LeakSanitizer: detected memory leaks

    Direct leak of 87464 byte(s) in 2 object(s) allocated from:
        #0 0x7fc8636664c8 in __interceptor_malloc (/usr/lib/gcc/x86_64-pc-linux-gnu/9.3.0/libasan.so.5+0x10c4c8)
        #1 0x7fc8630edf5a in sqlite3MemMalloc /usr/src/debug/dev-db/sqlite-3.32.3-r1/sqlite-src-3320300-abi_x86_64.amd64/sqlite3.c:23640

    Indirect leak of 8 byte(s) in 1 object(s) allocated from:
        #0 0x7fc8636664c8 in __interceptor_malloc (/usr/lib/gcc/x86_64-pc-linux-gnu/9.3.0/libasan.so.5+0x10c4c8)
        #1 0x7fc8630edf5a in sqlite3MemMalloc /usr/src/debug/dev-db/sqlite-3.32.3-r1/sqlite-src-3320300-abi_x86_64.amd64/sqlite3.c:23640

    SUMMARY: AddressSanitizer: 87472 byte(s) leaked in 3 allocation(s).
    ```

    ```
    AddressSanitizer:DEADLYSIGNAL
    =================================================================
    ==30633==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000000 (pc 0x7f3c5d609fec bp 0x618000201480 sp 0x7f3c3f808da0 T7)
    ==30633==The signal is caused by a READ memory access.
    ==30633==Hint: address points to the zero page.
        #0 0x7f3c5d609feb in sqlite3ExprDeleteNN /usr/src/debug/dev-db/sqlite-3.32.3-r1/sqlite-src-3320300-abi_x86_64.amd64/sqlite3.c:100640
        #1 0x7f3c5d60a02b in sqlite3ExprDeleteNN /usr/src/debug/dev-db/sqlite-3.32.3-r1/sqlite-src-3320300-abi_x86_64.amd64/sqlite3.c:100658
        #2 0x7f3c5d60a02b in sqlite3ExprDeleteNN /usr/src/debug/dev-db/sqlite-3.32.3-r1/sqlite-src-3320300-abi_x86_64.amd64/sqlite3.c:100658
        #3 0x7f3c5d60a08a in sqlite3ExprDelete /usr/src/debug/dev-db/sqlite-3.32.3-r1/sqlite-src-3320300-abi_x86_64.amd64/sqlite3.c:100680
        #4 0x7f3c5d60a831 in yy_destructor /usr/src/debug/dev-db/sqlite-3.32.3-r1/sqlite-src-3320300-abi_x86_64.amd64/sqlite3.c:155691
        #5 0x7f3c5d60a8c9 in yy_pop_parser_stack /usr/src/debug/dev-db/sqlite-3.32.3-r1/sqlite-src-3320300-abi_x86_64.amd64/sqlite3.c:155788
        #6 0x7f3c5d60a8e3 in sqlite3ParserFinalize /usr/src/debug/dev-db/sqlite-3.32.3-r1/sqlite-src-3320300-abi_x86_64.amd64/sqlite3.c:155796
        #7 0x7f3c5d69e91c in sqlite3RunParser /usr/src/debug/dev-db/sqlite-3.32.3-r1/sqlite-src-3320300-abi_x86_64.amd64/sqlite3.c:159795
        #8 0x7f3c5d69ee6e in sqlite3Prepare /usr/src/debug/dev-db/sqlite-3.32.3-r1/sqlite-src-3320300-abi_x86_64.amd64/sqlite3.c:128705
        #9 0x7f3c5d69efef in sqlite3LockAndPrepare /usr/src/debug/dev-db/sqlite-3.32.3-r1/sqlite-src-3320300-abi_x86_64.amd64/sqlite3.c:128777
        #10 0x7f3c5d69f0ba in sqlite3Reprepare /usr/src/debug/dev-db/sqlite-3.32.3-r1/sqlite-src-3320300-abi_x86_64.amd64/sqlite3.c:128810
        #11 0x7f3c5d6903bb in sqlite3_step /usr/src/debug/dev-db/sqlite-3.32.3-r1/sqlite-src-3320300-abi_x86_64.amd64/sqlite3.c:83594

        (...)

        #25 0x7f3c5b95aef9 in start_thread (/lib64/libpthread.so.0+0x7ef9)
        #26 0x7f3c59ea4cfe in clone (/lib64/libc.so.6+0xf9cfe)

    AddressSanitizer can not provide additional info.

    ```

    I can do this whole day.

3. Finally, with 3.33.0, the following comes up pretty consistently:

    ```
    =================================================================
    ==3516==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000000 (pc 0x7f19a5c13123 bp 0x6120005cd678 sp 0x7f1982dc59b0 T15)
    ==3516==The signal is caused by a READ memory access.
    ==3516==Hint: address points to the zero page.
        #0 0x7f19a5c13122 in sqlite3DbMallocRawNN /usr/src/debug/dev-db/sqlite-3.33.0/sqlite-src-3330000-abi_x86_64.amd64/sqlite3.c:27921
        #1 0x7f19a5c131e9 in sqlite3DbMallocRaw /usr/src/debug/dev-db/sqlite-3.33.0/sqlite-src-3330000-abi_x86_64.amd64/sqlite3.c:27900
        #2 0x7f19a5c287d6 in sqlite3VdbeMemTranslate /usr/src/debug/dev-db/sqlite-3.33.0/sqlite-src-3330000-abi_x86_64.amd64/sqlite3.c:30904
        #3 0x7f19a5c28cb7 in sqlite3VdbeChangeEncoding /usr/src/debug/dev-db/sqlite-3.33.0/sqlite-src-3330000-abi_x86_64.amd64/sqlite3.c:76045
        #4 0x7f19a5c28d5c in valueToText /usr/src/debug/dev-db/sqlite-3.33.0/sqlite-src-3330000-abi_x86_64.amd64/sqlite3.c:77071
        #5 0x7f19a5c28e87 in sqlite3ValueText /usr/src/debug/dev-db/sqlite-3.33.0/sqlite-src-3330000-abi_x86_64.amd64/sqlite3.c:77116
        #6 0x7f19a5c39be3 in sqlite3_value_text16 /usr/src/debug/dev-db/sqlite-3.33.0/sqlite-src-3330000-abi_x86_64.amd64/sqlite3.c:83209
        #7 0x7f19a5c39c79 in columnName /usr/src/debug/dev-db/sqlite-3.33.0/sqlite-src-3330000-abi_x86_64.amd64/sqlite3.c:84202
        #8 0x7f19a5c39d38 in sqlite3_column_decltype16 /usr/src/debug/dev-db/sqlite-3.33.0/sqlite-src-3330000-abi_x86_64.amd64/sqlite3.c:84252

        (...)

        #22 0x7f19a29f674b  (/usr/lib64/libQt5Core.so.5+0xac74b)
        #23 0x7f19a3f69ef9 in start_thread (/lib64/libpthread.so.0+0x7ef9)
        #24 0x7f19a24b3cfe in clone (/lib64/libc.so.6+0xf9cfe)

    AddressSanitizer can not provide additional info.

    ==3516==ABORTING
    ```

All of these errors across all versions disappear when we comment out the CREATE TEMP TRIGGER statement.

According to pragma compile_options, we have THREADSAFE=1 set, among other flags chosen by the gentoo devs. Compiler is gcc 9.3.0 for sqlite 3.33.x and 3.32.x. For 3.31.x, we seem to have used clang-10 for no particular reason. We are compiling sqlite amalgamation with -Og -ggdb.

I'm happy to provide additional info if requested. Alternatively, I'd greatly appreciate a nudge in the right direction if you think this is our fault.

Best regards,
Burak

(2) By Keith Medcalf (kmedcalf) on 2020-09-11 00:17:09 in reply to 1 [link]

Seems to work with the current version:

```
sqlite> create table x (x);
sqlite> create trigger xx after insert on x begin
   ...> select sin(.5);
   ...> end;
sqlite> insert into x values (1);
sqlite> drop trigger xx;
sqlite> create temporary trigger xx after insert on x begin
   ...> select sin(.5);
   ...> end;
sqlite> insert into x values (1);
sqlite> .eqp full
sqlite> insert into x values (1);
addr  opcode         p1    p2    p3    p4             p5  comment
----  -------------  ----  ----  ----  -------------  --  -------------
0     Init           0     10    0                    0   Start at 10
1     OpenWrite      0     2     0     1              0   root=2 iDb=0; x
2     Integer        1     2     0                    0   r[2]=1
3     NewRowid       0     1     0                    0   r[1]=rowid
4     Noop           0     0     0                    0   BEGIN: GenCnstCks(0,1,1,0,0)
5     MakeRecord     2     1     3                    0   r[3]=mkrec(r[2])
6     Noop           0     0     0                    0   END: GenCnstCks(0)
7     Insert         0     3     1     x              57  intkey=r[1] data=r[3]
8     Program        -1    9     4     program        0   Call: xx.default
9     Halt           0     0     0                    0
10    Transaction    0     1     3     0              1   usesStmtJournal=0
11    Goto           0     1     0                    0
0     Init           0     1     0     -- TRIGGER xx  0   Start at 1; Start: xx.default (AFTER INSERT ON x)
1     Trace          2147483647  1     0     -- select sin(.5)  0
2     Noop           0     0     0                    0   Begin WHERE-core
3     Real           0     2     0     0.5            0   r[2]=0.5
4     Function       1     2     1     sin(1)         0   r[1]=func(r[2])
5     Noop           0     0     0                    0   End WHERE-core
6     Halt           0     0     0                    0   End: xx.default
sqlite> .exit
```

sin(x) is a "custom function" that computes the sine function using the math library.  Going back to the 3.31.1 version seems to work fine too:

```
SQLite version 3.31.1 2020-01-27 21:57:15
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
sqlite> .eqp off
sqlite> create table x(x);
sqlite> create temporary trigger xx after insert on x
   ...> begin
   ...>  select sin(.5);
   ...> end;
sqlite> .eqp full
sqlite> insert into x values (1);
addr  opcode         p1    p2    p3    p4             p5  comment
----  -------------  ----  ----  ----  -------------  --  -------------
0     Init           0     10    0                    00  Start at 10
1     OpenWrite      0     2     0     1              00  root=2 iDb=0; x
2     Integer        1     2     0                    00  r[2]=1
3     NewRowid       0     1     0                    00  r[1]=rowid
4     Noop           0     0     0                    00  BEGIN: GenCnstCks(0,1,1,0,0)
5     MakeRecord     2     1     3                    00  r[3]=mkrec(r[2])
6     Noop           0     0     0                    00  END: GenCnstCks(0)
7     Insert         0     3     1     x              39  intkey=r[1] data=r[3]
8     Program        -1    9     4     program        00  Call: xx.default
9     Halt           0     0     0                    00
10    Transaction    0     1     1     0              01  usesStmtJournal=0
11    Goto           0     1     0                    00
0     Init           0     1     0     -- TRIGGER xx  00  Start at 1; Start: xx.default (AFTER INSERT ON x)
1     Trace          2147483647  1     0     -- select sin(.5)  00
2     Noop           0     0     0                    00  Begin WHERE-core
3     Real           0     2     0     0.5            00  r[2]=0.5
4     Function       1     2     1     sin(1)         00  r[1]=func(r[2])
5     Noop           0     0     0                    00  End WHERE-core
6     Halt           0     0     0                    00  End: xx.default
sqlite>
```

Perhaps it is your function some_function.  If you replace some_function() with a different function (example, datetime) do you get the same result?

```
SQLite version 3.31.1 2020-01-27 21:57:15
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
sqlite> create table x(x);
sqlite> create temporary trigger xx after insert on x
   ...> begin
   ...>  select datetime();
   ...> end;
sqlite> insert into x values (1);
sqlite> .eqp full
sqlite> insert into x values (2);
addr  opcode         p1    p2    p3    p4             p5  comment
----  -------------  ----  ----  ----  -------------  --  -------------
0     Init           0     10    0                    00  Start at 10
1     OpenWrite      0     2     0     1              00  root=2 iDb=0; x
2     Integer        2     2     0                    00  r[2]=2
3     NewRowid       0     1     0                    00  r[1]=rowid
4     Noop           0     0     0                    00  BEGIN: GenCnstCks(0,1,1,0,0)
5     MakeRecord     2     1     3                    00  r[3]=mkrec(r[2])
6     Noop           0     0     0                    00  END: GenCnstCks(0)
7     Insert         0     3     1     x              39  intkey=r[1] data=r[3]
8     Program        -1    9     4     program        00  Call: xx.default
9     Halt           0     0     0                    00
10    Transaction    0     1     1     0              01  usesStmtJournal=0
11    Goto           0     1     0                    00
0     Init           0     1     0     -- TRIGGER xx  00  Start at 1; Start: xx.default (AFTER INSERT ON x)
1     Trace          2147483647  1     0     -- select datetime()  00
2     Noop           0     0     0                    00  Begin WHERE-core
3     Function       0     0     1     datetime(-1)   00  r[1]=func(r[0])
4     Noop           0     0     0                    00  End WHERE-core
5     Halt           0     0     0                    00  End: xx.default
sqlite> .eqp off
sqlite> select * from x;
1
2
sqlite>
```

(3) By Burak Arslan (burakarslan) on 2020-09-11 09:15:00 in reply to 2 [link]

Hello Keith,

Thanks for your reply. Sorry if I wasn't clear but:

1. Everything works fine in a single threaded environment. We start seeing the issues only when we introduce additional readers in separate threads. CREATE TEMP TRIGGER seems to be the culprit, though I can't be sure.

2. the function only does logging (ie printf)

We are continuing the investigations at our end, I will post an update if we come up with something.

Best regards,

Burak

(4) By Larry Brasfield (LarryBrasfield) on 2020-09-11 14:11:51 in reply to 3 [link]

> CREATE TEMP TRIGGER seems to be the culprit, though I can't be sure.

It is way too early in your debugging process to be naming culprits. You not only cannot be sure -- such thinking is likely to distract or derail the debugging that needs to be done.  What you have learned (apparently) is that the problem becomes manifest when (somehow) the CREATE TEMP TRIGGER statement is run or the trigger executes. (Which is it?) But when a problem becomes manifestly obvious and when things went wrong which ultimately lead to such manifestation are usually different times, and often much different in terms of intervening activity. Heap corruption and its ultimate effects provide a good example. (Is that happening?)

Keith asked, "If you replace some_function() with a different function (example, datetime) do you get the same result?" Seeing no answer to that question, I think you clearly have not begun to to flush out the bugs in your own code.

If you are using SQLite correctly, its behavior should be no different between a single-threaded environment and a multi-threaded one, except of course for the possibilities arising from DB operation ordering and interaction. I would consider such evidence as "It's fine single-threaded but fails multi-threaded." to be a potent clue, leading me to see if my home-spun write-access protection is working and to see if the same sequence of DB operations that occur in the failure scenario produce the same problem when they are forced to execute sequentially. As you surely know, multi-threaded applications provide a wondrous variety of ways to create bugs. You may be seeing nothing more than yet another proof of this.

(5) By Richard Hipp (drh) on 2020-09-11 14:28:13 in reply to 1

I think you are assuming that the problem you are having is in SQLite.  And,
who knows, that could yet prove to be the case.  But historically, when problems
like this have come up, and especially problems with memory allocation, the
error has almost always been heap corruption caused by other unrelated
modules of the application that SQLite just happen to have the misfortune of
tripping over.