[SEE] crash at sqlite3CodecGetKey then vsnprintf on Windows 10

(1) By tham (thamht4190) on 2021-02-18 10:58:53 [link] [source]

Hi there,

I got a one-time crash from customers at this line of code:

const int ret = sqlite3_exec(mpSQLite, "PRAGMA optimize", NULL, NULL, NULL);

and cannot reproduce it.

With your experience with sqlite, can you please give some suggestions/hints when this can happen? (I'm using SEE version of sqlite, the source file named sqlite3-see-aes256-ofb.c)

Here is the stack trace:

STACK_TEXT:  
00000045`4cefaba0 00007ff6`e0371342     : 00007ff6`e0851258 000001cc`9db9a030 00000000`00000000 00007ff6`e0851258 : CortexService!vsnprintf+0x4a187
00000045`4cefac20 00007ff6`e0398d73     : 00000000`00000000 00000045`4cefad29 00000000`00000000 000001cc`a0bfaa80 : CortexService!vsnprintf+0x65082
00000045`4cefac80 00007ff6`e0398599     : 00007ff6`00000000 00000000`000005d2 000001cc`a0bfb3e0 00007ff6`e03b7ba0 : CortexService!sqlite3CodecGetKey+0x1f2a3
00000045`4cefad90 00007ff6`e03a6fe5     : 00000045`4cefbb40 00007ff6`ddc00000 00000045`4cefaef0 000001cc`9db46d70 : CortexService!sqlite3CodecGetKey+0x1eac9
00000045`4cefadc0 00007ff6`e03a1d63     : 000001cc`a0bfb3e0 000001cc`9db46d70 00000000`00000000 00000000`00000000 : CortexService!sqlite3CodecGetKey+0x2d515
00000045`4cefadf0 00007ff6`e040113d     : 00000000`00000001 00000000`00000000 00000000`00000000 00007ff6`e15d5710 : CortexService!sqlite3CodecGetKey+0x28293
00000045`4cefaf90 00007ff6`e03a0e4a     : 00000045`4cefb130 00000000`00000410 00000045`4cefb120 000001cc`9db46b00 : CortexService!sqlite3_win32_write_debug+0x1dcbd
00000045`4cefb070 00007ff6`e03a96c5     : 00000000`00000001 00000045`4cefb1e0 00000000`0000003b 000001cc`9db46b00 : CortexService!sqlite3CodecGetKey+0x2737a
00000045`4cefb0e0 00007ff6`e03a680e     : 000001cc`9dba2970 000001cc`9d4c0000 000001cc`8e090000 000001cc`9db46b00 : CortexService!sqlite3CodecGetKey+0x2fbf5
00000045`4cefbb00 00007ff6`e039ca12     : 000001cc`a0bff580 00000000`0000001b 00000000`00000000 00000000`00000000 : CortexService!sqlite3CodecGetKey+0x2cd3e
00000045`4cefbd30 00007ff6`e03deb4a     : 000001cc`a0bfbd40 00000000`00000000 000001cc`ffffffff 00000045`00000080 : CortexService!sqlite3CodecGetKey+0x22f42
00000045`4cefbdb0 00007ff6`e03da2a8     : 00000000`00000000 000001cc`a0bfbd40 00000000`00000000 00000000`00000000 : CortexService!sqlite3_step+0x12a
00000045`4cefbf50 00007ff6`e042d321     : 00000000`00000000 000001cc`9da14190 00000000`00000000 00000000`00000000 : CortexService!sqlite3_exec+0x168
00000045`4cefbfe0 00007ff6`e042755a     : 00000000`00000000 000001cc`9d5b3930 00000000`00000000 00007fff`32dbe97b : CortexService!SQLite::Database::exec+0x21
00000045`4cefc040 00007ff6`e0424e9a     : 000001cc`9dc872d0 00000000`00000000 00000000`00000000 00007ff6`e042a936 : CortexService!litecore::SQLiteDataFile::optimize+0x18a
00000045`4cefc140 00007ff6`e0301142     : 000001cc`9da14790 000001cc`9da14390 000001cc`a0e76010 000001cc`8e090000 : CortexService!litecore::SQLiteDataFile::_close+0xaa
00000045`4cefc1d0 00007ff6`e0478cb8     : 000001cc`a0c914a0 00000045`4cefc469 000001cc`9cc3c1f8 000001cc`8e96aac8 : CortexService!litecore::DataFile::close+0x92
00000045`4cefc200 00007ff6`e02f0ee4     : 00000045`4cefc310 00007ffe`ef233811 000001cc`8e14d6f0 000001cc`0000030f : CortexService!c4Internal::tryCatch+0x18
00000045`4cefc240 00007ff6`e02964d5     : 00000000`00000000 000001cc`8e96a410 000001cc`8ea8cba0 000001cc`8ea8cb68 : CortexService!c4db_close+0x64

Thanks for your help,

(2) By Richard Hipp (drh) on 2021-02-18 12:41:09 in reply to 1 [source]

The stack trace makes no sense. sqlite3CodecGetKey() never calls itself, and it never invokes vsnprintf() either directly, or indirectly. Indeed, nothing inside of SQLite ever invokes vsnprintf()!

This suggests that either (1) the symbol file used to construct the stack trace is not the correct symbol file for the binary that generated the stack trace, or (2) your stack is badly corrupted, perhaps due to a buffer overwrite in some other part of the process.

(3) By tham (thamht4190) on 2021-02-19 02:14:46 in reply to 2 [link] [source]

Thanks Richard. Your suggestions are very helpful. If so, I cannot do anything more for this crash.

(4) By Larry Brasfield (larrybr) on 2021-02-19 03:04:19 in reply to 3 [link] [source]

It's usually hard to diagnose one-off crashes. From the large and wandering offsets [a] in that alleged call stack dump [b], I doubt it is real and the appearance of any particular entry point name should not be taken to mean that the allegedly entered routine so named participated in the (mis)action.

[a. eg. sqlite3CodecGetKey+0x1f2a3 is 0x1f2a3 bytes offset from the known symbol address of sqlite3CodecGetKey ]

[b. I say "alleged" because what is dumped is a tool's effort to reconstruct what the call stack was, based on certain conventions for storing frame pointers and return addresses. The reconstruction is only right when the data going into it resulted from a sequence of nested calls. Random memory data, subject to the same reconstruction, will show a "call stack" that never happened as a sequence of calls not yet matched with returns. ]

One form of an execution thread going goofy is where a return address (in a real call stack) gets corrupted by an on-stack buffer being written past its end or by a write through a bad pointer, clobbering the stored address. Then the return, which amounts to loading that corrupted value into the instruction pointer, becomes a "jump to somewhere random", followed by interpretation of random memory content as instructions. This can go on for some time before an attempt is made to execute in a non-code address region or a thru-pointer access produces an address fault and an ostensible stack dump.

I don't know what more you can or should do, since I don't know what you have done, or what the customer situation is. But you should be ready to see if the problem repeats, with a similar or identical "stack dump". If it becomes repeatable, then it can be induced while running the program with a debugger to see what has happened just before things go awry. Unfortunately, we normally cannot tell what has led to a problem from the downstream results when they are so strange as your alleged stack dump.

If your customer is willing, you should at least consider getting an account of what situation and inputs immediately preceded the crash. Then you have a chance of recognizing repetition of the same problem. (And it's not the same when all you can say is "The program crashed.")

It is the near-hopeless situation of facing mysteriously induced crashes that makes many of us try to be very careful about writing C or C++ (or assembler) or other near-machine-level code. It also explains the attraction of so-called "managed code" execution systems, such as Java, Python, .Net and many others.

(5) By tham (thamht4190) on 2021-02-19 07:11:45 in reply to 4 [link] [source]

Thanks Larry for your very clear explanation. I learned more about how dump trace is done.

And this is my situation:

The crash report is sent automatically, the customer may not acknowledge of the crash because it happened when he/she shutdown the machine and our background service was doing some resources store and then close the database. Just before closing database, this crash occurs.
This part of code has been written for several years and no update, I haven't seen this crash before. I also reviewed the source code to see if there might be any race condition & tried to reproduce it, but I couldn't succeed. So maybe Richard is right about "our stack is badly corrupted, perhaps due to a buffer overwrite in some other part of the process."

Since the stack trace doesn't give much information, I think I will choose to ignore this issue, until there are same crashes or more information in the future.