SQLite Forum

sessionfuzz fails on some architectures (ARM, PPC, SPARC)

sessionfuzz fails on some architectures (ARM, PPC, SPARC)

(1.1) By Arfrever Frehtes Taifersar Arahesis (Arfrever) on 2020-07-22 20:11:45 edited from 1.0 [source]

Some users have reported that sessionfuzz fails on some architectures (ARM, PPC, SPARC).

I do not know if underlying cause of problem is the same on all of these architectures.

I do not have access to that hardware. sessionfuzz succeeds for me on x86_32 and x86_64.



SQLite configuration:

./configure --enable-load-extension --enable-threadsafe --enable-fts5 --enable-session --disable-debug --disable-editline --enable-readline --with-readline-inc=-I/usr/include/readline --disable-static --enable-tcl

GDB output on PPC:

Reading symbols from ./sessionfuzz...
(gdb) r
Starting program: /var/tmp/portage/dev-db/sqlite-3.32.3/work/sqlite-src-3320300-.ppc/sessionfuzz run test/sessionfuzz-data1.db
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/libthread_db.so.1".
sessionfuzz-data1.db: sessionfuzz: ./sqlite3.c:57398: pager_open_journal: Assertion `rc!=SQLITE_OK || isOpen(pPager->jfd)' failed.

Program received signal SIGABRT, Aborted.
0xf7e0d1d0 in raise () from /lib/libc.so.6
(gdb) bt
#0  0xf7e0d1d0 in raise () from /lib/libc.so.6
#1  0xf7df2e2c in abort () from /lib/libc.so.6
#2  0xf7e03b2c in ?? () from /lib/libc.so.6
#3  0xf7e03bb0 in __assert_fail () from /lib/libc.so.6
#4  0x004c85d0 in pager_open_journal (pPager=0x6448e0) at ./sqlite3.c:57398
#5  pager_write (pPg=0x66d6a0) at ./sqlite3.c:57596
#6  0x004cfaf4 in insertCell (pPage=0x66d6c8, i=0, pCell=0x643bb4 "\002\002\r", sz=4, pTemp=0x0, iChild=0, pRC=0xffffda40) at ./sqlite3.c:71088
#7  0x004d9bb4 in sqlite3BtreeInsert (pCur=0x64d398, pX=0xffffdb70, flags=<optimized out>, seekResult=<optimized out>) at ./sqlite3.c:73173
#8  0x0051d110 in sqlite3VdbeExec (p=<optimized out>) at ./sqlite3.c:90754
#9  0x0052b354 in sqlite3Step (p=0x64dc90) at ./sqlite3.c:83488
#10 sqlite3_step (pStmt=0x64dc90) at ./sqlite3.c:18017
#11 0x00565aa0 in sessionApplyOneOp (pIter=0x656f10, p=0xffffdee8, xConflict=0x447ca4 <conflictCall>, pCtx=0x0, pbReplace=0xffffde74, pbRetry=0xffffde78) at ./sqlite3.c:205933
#12 0x00565ee8 in sessionApplyOneWithRetry (db=0x6431c0, pIter=0x656f10, pApply=0xffffdee8, xConflict=0x447ca4 <conflictCall>, pCtx=0x0) at ./sqlite3.c:205968
#13 0x00591948 in sessionChangesetApply (db=0x6431c0, pIter=0x656f10, xFilter=0x0, xConflict=0x447ca4 <conflictCall>, pCtx=0x0, ppRebase=0x0, pnRebase=0x0, flags=<optimized out>)
    at ./sqlite3.c:206207
#14 0x005928f4 in sqlite3changeset_apply_v2 (db=0x6431c0, nChangeset=<optimized out>, pChangeset=<optimized out>, xFilter=0x0, xConflict=0x447ca4 <conflictCall>, pCtx=0x0,
    ppRebase=0x0, pnRebase=0x0, flags=0) at ./sqlite3.c:206288
#15 0x00417bb0 in sqlite3changeset_apply (pCtx=0x0, xConflict=0x447ca4 <conflictCall>, xFilter=0x0, pChangeset=0x6664d0, nChangeset=<optimized out>, db=<optimized out>)
    at ./sqlite3.c:206315
#16 main (argc=3, argv=<optimized out>) at /var/tmp/portage/dev-db/sqlite-3.32.3/work/sqlite-src-3320300-.ppc/test/sessionfuzz.c:930

(2) By Richard Hipp (drh) on 2020-07-23 20:59:41 in reply to 1.1 [link] [source]

Can you please try the patch at check-in 40c44d38104dfcb6 and let me know if it works better for you on your platforms? I am only able to confirm the problem on ARM using GCC 8.3.0 and -O2.

Other potential workarounds for this problem include:

  • Use an earlier version of GCC that does not have the bug.
  • Use -Os instead of -O2

Unfortunately, I do not at this time have a simple test case to demonstrate the GCC problem. The easiest demonstration I have is the following:

  1. Download https://sqlite.org/tmp/gcc-problem-20200723/sftest.c (7.8MB) and https://sqlite.org/tmp/gcc-problem-20200723/data1.db (252KB).
  2. Compile using: "$GCC -O2 sftest.c"
  3. Run like this: "./a.out run data1.db"

If you uncomment the "SQLITE_NOINLINE" macro (which resolves to __attribute__((noinline)) for gcc) on line 97448 of the sftest.c file then it works. If you change the -O2 on step 2 to anything else, it appears to work as well.

The problem appears to be here:

57504    if( !isOpen(pPager->jfd) ){
57505      if( pPager->journalMode==PAGER_JOURNALMODE_MEMORY ){
57506        sqlite3MemJournalOpen(pPager->jfd);
57507      }else{
57509        int nSpill;
57511        if( pPager->tempFile ){
57513          nSpill = sqlite3Config.nStmtSpill;
57514        }else{
57515          flags |= SQLITE_OPEN_MAIN_JOURNAL;
57516          nSpill = jrnlBufferSize(pPager);
57517        }
57519        /* Verify that the database still has the same name as it did when
57520        ** it was originally opened. */
57521        rc = databaseIsUnmoved(pPager);
57522        if( rc==SQLITE_OK ){
57523          rc = sqlite3JournalOpen (
57524              pVfs, pPager->zJournal, pPager->jfd, flags, nSpill
57525          );
57526        }
57527      }
57528      assert( rc!=SQLITE_OK || isOpen(pPager->jfd) );
57529    }

The assert() on line 57528 is the one that is failing in the Gentoo bug reports. The isOpen() is a macro:

   #define isOpen(X) ((X)->pMethods!=0)

So on line 57504, pMethods is NULL. But it gets changed to non-NULL by the sqlite3MemJournalOpen() call on line 57506. My best guess is that the gcc optimizer does not understand that the value of pMethods might (and does) change and so it remembers the prior NULL value and uses it in the assert() on line 57528, causing the assert() to fail even though pMethods is now non-NULL. Preventing the sqlite3MemJournalOpen() call from being inlined clears the problem.

(3) By Richard Hipp (drh) on 2020-07-24 08:36:20 in reply to 2 [link] [source]

This turns out to be a misuse of pointer aliasing by SQLite, not a bug in GCC. I'll fix the SQLite code soon.

I could insert here a lengthy rant about how the C programming language has been thoroughly broken by the "standards committee" to the point that it is no longer possible to use it safely and effectively. But that rant has been made by others already who are far more eloquent than me, so I will spare you all.

(4) By Richard Hipp (drh) on 2020-07-24 09:16:50 in reply to 1.1 [link] [source]

Fix for this problem is in check-in 892e9191dc8f8056.

A work-around is to compile with -fno-strict-aliasing

(5) By Arfrever Frehtes Taifersar Arahesis (Arfrever) on 2020-07-26 23:14:18 in reply to 4 [link] [source]

Rolf Eike Beer confirmed that new fixes work on SPARC.

Sam James confirmed that new fixes work on ARM/PPC.