"database disk image is malformed" with an in-memory database?

(1) By Philip O'Toole (otoolep) on 2022-11-01 20:22:17 [source]

Hi - I'm the creator of rqlite, the distributed database which uses vanilla SQLite as its storage engine. rqlite uses an in-memory SQLite database.

One of my users is reporting that they get this error occasionally when querying the database. Any idea what could be causing this?

Full details below -- thanks.

Philip

GitHub ticket: https://github.com/rqlite/rqlite/issues/1103

Machine: amd64, Linux.

SQLite version: 3.38.5

Connection used to query the database: file:/FiGbErArFfLlJdfkfsfB?mode=ro&vfs=memdb&_txlock=deferred&_fk=true

Connection used to write to the database: file:/FiGbErArFfLlJdfkfsfB?mode=rw&vfs=memdb&_txlock=immediate&_fk=true

Serialized size of the SQLite database and other memory stats:

        "db_size": 129212416,
        "mem_stats": {
            "cache_size": -2000,
            "freelist_count": 0,
            "hard_heap_limit": 0,
            "max_page_count": 1073741823,
            "page_count": 31546,
            "page_size": 4096,
            "soft_heap_limit": 0
        },

Compilation options:

compile_options": [ "ATOMIC_INTRINSICS=1", "COMPILER=gcc-9.4.0", "DEFAULT_AUTOVACUUM", "DEFAULT_CACHE_SIZE=-2000", "DEFAULT_FILE_FORMAT=4", "DEFAULT_JOURNAL_SIZE_LIMIT=-1", "DEFAULT_MMAP_SIZE=0", "DEFAULT_PAGE_SIZE=4096", "DEFAULT_PCACHE_INITSZ=20", "DEFAULT_RECURSIVE_TRIGGERS", "DEFAULT_SECTOR_SIZE=4096", "DEFAULT_SYNCHRONOUS=2", "DEFAULT_WAL_AUTOCHECKPOINT=1000", "DEFAULT_WAL_SYNCHRONOUS=1", "DEFAULT_WORKER_THREADS=0", "ENABLE_DBSTAT_VTAB", "ENABLE_FTS3", "ENABLE_FTS3_PARENTHESIS", "ENABLE_RTREE", "ENABLE_UPDATE_DELETE_LIMIT", "MALLOC_SOFT_LIMIT=1024", "MAX_ATTACHED=10", "MAX_COLUMN=2000", "MAX_COMPOUND_SELECT=500", "MAX_DEFAULT_PAGE_SIZE=8192", "MAX_EXPR_DEPTH=1000", "MAX_FUNCTION_ARG=127", "MAX_LENGTH=1000000000", "MAX_LIKE_PATTERN_LENGTH=50000", "MAX_MMAP_SIZE=0x7fff0000", "MAX_PAGE_COUNT=1073741823", "MAX_PAGE_SIZE=65536", "MAX_SQL_LENGTH=1000000000", "MAX_TRIGGER_DEPTH=1000", "MAX_VARIABLE_NUMBER=32766", "MAX_VDBE_OP=250000000", "MAX_WORKER_THREADS=8", "MUTEX_PTHREADS", "OMIT_DEPRECATED", "OMIT_LOAD_EXTENSION", "OMIT_SHARED_CACHE", "SYSTEM_MALLOC", "TEMP_STORE=1", "THREADSAFE=1" ],

(2) By Simon Slavin (slavin) on 2022-11-03 02:21:05 in reply to 1 [link] [source]

Checking that I understand your post. The program, and the SQLite calls it makes, work fine for a while. Then, without doing anything significant or new, queries which have worked before stop working. Is that right ?

I'm concerned at an in-memory database which has "db_size": 129212416. That's close to a 128Meg database in memory. Is this an unusually big memory for your users ? Are other users using bigger databases than this without problems ? Can you fake up a database bigger than that yourself and see whether your code handles it without problems ?

I'm betting that SQLite is trying and failing to allocate more memory. Or it's busting a cache or something.

(3) By Philip O'Toole (otoolep) on 2022-11-03 03:42:02 in reply to 2 [link] [source]

According to the initial report (see See https://github.com/rqlite/rqlite/issues/1103#issuecomment-1299062521):

"Just one extra information. In all cases that happened so far, the SELECT was trying to find a record that was just INSERTED. In one case the INSERT was performed at 2022-11-01 10:01:42,601 and the SELECT at 2022-11-01 10:01:42,731"

I was also wondering about the memory of the host machine, but it looks sufficient (see https://github.com/rqlite/rqlite/issues/1103#issuecomment-1299106153). But, of course, perhaps the process itself has lower limits and it's causing SQLite problems.

128MB is not too big. While rqlited isn't designed for very large datasets, I haven't heard of other problems until databases started hitting a 2GB limit (but that is a known limit with SQLite).

I'll ask some more questions on the GitHub ticket.

(4) By anonymous on 2022-11-03 06:02:27 in reply to 3 [link] [source]

The recover extension (see source tree) has an user defined function of:

** The implementation of a user-defined SQL function invoked by the

** sqlite_dbdata and sqlite_dbptr virtual table modules to access pages

** of the database being recovered.

** This function always takes a single integer argument. If the argument

** is zero, then the value returned is the number of pages in the db being

** recovered. If the argument is greater than zero, it is a page number.

** The value returned in this case is an SQL blob containing the data for

** the identified page of the db being recovered. e.g.

** SELECT getpage(0); -- return number of pages in db

** SELECT getpage(4); -- return page 4 of db as a blob of data

I have not had to use it, but might be useful to you.

(5) By anonymous on 2022-11-03 06:04:28 in reply to 3 [link] [source]

scratch that suggestion... but a more general one of tossing the recover extension at it might show up something for you (and then trying to get the pages that are covered via the select in question?).

(6) By Simon Slavin (slavin) on 2022-11-03 14:07:26 in reply to 3 [link] [source]

Yeah, this is part of what made me think of something hitting a max-memory boundary which SQLite cannot detect. Because I think it's the INSERT which is failing, leaving a database which is malformed. One way to investigate is to find out whether your user always has trouble when the database reaches or exceeds the size of the one she/he reported.

I understand that there's no reason for SQLite itself to baulk at a 128M database in memory. It's supposed to work fine, as long as the OS supports it, and to give an error if it doesn't.

Can you tell us the OS ? I can see it's amd64, and they used 'cat', but I can't deduce anything else. I was interested to know if the people having this problem are using Kubernetes, but I see that their report says they aren't. Can you confirm that they're running on physical piece of hardware and not some kind of virtual or simulated environment ?

BTW, there's something wrong with the cert for https://www.rqlite.com. Try getting there from a browser you don't normally use.

(7) By Michael A. Cleverly (cleverly) on 2022-11-03 14:25:15 in reply to 6 [link] [source]

To elaborate on Simon's point that there is something wrong with the www.rqlite.com cert: the cert being presented is issued by Let's Encrypt for www.philipotoole.com rather than www.rqlite.com (hostname mismatch).

(8) By Philip O'Toole (otoolep) on 2022-11-08 13:58:31 in reply to 6 [link] [source]

It's Alpine Linux v3.16.

(9) By Philip O'Toole (otoolep) on 2022-11-08 13:58:53 in reply to 7 [link] [source]

Where you directed to https://www.rqlite.com? I'd like to fix that.

(10) By Philip O'Toole (otoolep) on 2022-11-09 04:45:24 in reply to 1 [link] [source]

I have more details. To quote https://github.com/mattn/go-sqlite3/pull/1108#issuecomment-1308200668:

"I don't know what the underlying issue is, but I was able to get a bit more information from the SQLite error logs.

11: database corruption at line 69145 of [a29f994989]
11: statement aborts at 2: [SELECT COUNT(*) FROM logs] database disk image is malformed

That means the corruption is detected on line 69145 of sqlite3-binding.c. However, this output is not consistent - sometimes the corruption is detected on line 70169, but usually it is on line 69145.

That is deeper in the internals of the library than I am familiar with. If you can reproduce with a pure C version (using pthreads I guess) then you should file a bug on the forum."

(11) By Philip O'Toole (otoolep) on 2022-11-09 04:51:37 in reply to 1 [link] [source]

Sometimes this is being triggered in this code:

static int getAndInitPage(
  BtShared *pBt,                  /* The database file */
  Pgno pgno,                      /* Number of the page to get */
  MemPage **ppPage,               /* Write the page pointer here */
  BtCursor *pCur,                 /* Cursor to receive the page, or NULL */
  int bReadOnly                   /* True for a read-only page */
){
  int rc;
  DbPage *pDbPage;
  assert( sqlite3_mutex_held(pBt->mutex) );
  assert( pCur==0 || ppPage==&pCur->pPage );
  assert( pCur==0 || bReadOnly==pCur->curPagerFlags );
  assert( pCur==0 || pCur->iPage>0 );

  if( pgno>btreePagecount(pBt) ){
    rc = SQLITE_CORRUPT_BKPT;     ##### <---- here
    goto getAndInitPage_error1;
  }
  rc = sqlite3PagerGet(pBt->pPager, pgno, (DbPage**)&pDbPage, bReadOnly);
  if( rc ){
    goto getAndInitPage_error1;
  }
  *ppPage = (MemPage*)sqlite3PagerGetExtra(pDbPage);
  if( (*ppPage)->isInit==0 ){
    btreePageFromDbPage(pDbPage, pgno, pBt);
    rc = btreeInitPage(*ppPage);
    if( rc!=SQLITE_OK ){
      goto getAndInitPage_error2;
    }
  }
  assert( (*ppPage)->pgno==pgno || CORRUPT_DB );
  assert( (*ppPage)->aData==sqlite3PagerGetData(pDbPage) );

(12) By Simon Slavin (slavin) on 2022-11-09 13:20:04 in reply to 9 [link] [source]

The URL retrieves the right website. But it came up as a security risk because the certificate had the name of the site wrong.

(13) By Simon Slavin (slavin) on 2022-11-09 17:24:30 in reply to 11 [link] [source]

Would you be able to tell us the result of sqlite3_extended_errcode() ?

I'm 90% sure that the fault is being triggered by the INSERT command preceeding the SELECT. Would it be possible to temporarily add code to execute

PRAGMA quick_check

both before and after the command ? Perhaps something which executes even faster than that would be just as good, for example

SELECT * FROM MyTable LIMIT 1

Once again, the extended error code would be useful. It would be especially interested if the one before the INSERT reports a fault.

Having said that, I'm not sure how much good it will do. Suppose the INSERT command is messing things up. What would you investigate next ? Would it necessarily suggest something ran out of space ?

(14) By Philip O'Toole (otoolep) on 2022-11-10 01:24:00 in reply to 13 [link] [source]

Thanks for your response. Interestingly when I change my test so that

SELECT COUNT(*) FROM logs`

is replaced with

SELECT * FROM logs LIMIT 1

I can't get the failure to occur. Does that tell us anything?

(15) By Jesse Rittner (rittneje) on 2022-11-10 02:24:33 in reply to 13 [link] [source]

There is no extended error code. It just gives 11 (i.e., plain SQLITE_CORRUPT).

Running both PRAGMA quick_check and PRAGMA integrity_check on the connection that gave SQLITE_CORRUPT after the failure does not work. Usually it just says "ok", but on occasion one or both with fail with a plain SQLITE_CORRUPT.

Running those pragmas on the connection that is doing the inserts causes the issue to no longer reproduce.

Running SELECT * FROM logs LIMIT 1 causes the issue to no longer reproduce. However, I do observe the same corruption failure with SELECT * FROM logs ORDER BY rowid DESC LIMIT 1. Here I was able to get PRAGMA quick_check to output something:

*** in database main ***
On tree page 6935 cell 17: 2nd reference to page 7182
On tree page 6935 cell 398: Rowid 2239121 out of order
On tree page 6935 cell 398: 2nd reference to page 7181
On tree page 6935 cell 397: 2nd reference to page 7180
On tree page 6935 cell 396: 2nd reference to page 7179
On tree page 6935 cell 395: 2nd reference to page 7178
On tree page 6935 cell 394: 2nd reference to page 7177
On tree page 6935 cell 393: 2nd reference to page 7176
On tree page 6935 cell 392: 2nd reference to page 7175
On tree page 6935 cell 391: 2nd reference to page 7174

This is what the SQLite error logs show:

database corruption at line 69145 of [a29f994989]
statement aborts at 4: [SELECT * FROM logs ORDER BY rowid DESC LIMIT 1] database disk image is malformed

It is random when the failure is observed, which leads me to believe it is due to some race condition either in the memdb VFS or in the SQLite core itself.

(16) By anonymous on 2022-11-10 02:30:45 in reply to 14 [link] [source]

FWIW: I have not looked at it closely, and it's just a hunch, yet I wonder if the assert mutex should be held prior to calling getAndInitPage() to prevent the possibility of the ppage and pgno being changed as the function is called? If an insert were to happen while the assert mutex is going about it's thing, would not the ppage and pgno passed values be out of date/incorrect? I would image that only inserts that happen during that time, and only inserts that change those passed values would make it intermittent.

(17) By Philip O'Toole (otoolep) on 2022-12-06 03:01:16 in reply to 15 [link] [source]

Just closing the loop here, turned out to be a bug in the SQLite source. All fixed now.

https://sqlite.org/forum/forumpost/5adb92e2baca3678