after crash loosing all records in a multi-GB table

(1) By Stefan Schwarzer (anopheles63) on 2020-09-07 14:14:25 [link] [source]

Dear forum members / sqlite experts,

I am relatively new to sqlite and database programming is not at the center of my work, so I have a hard time finding the right keywords to search the forum. I apologize for questions with answers obvious to you or questions answered in previous posts or missing crucial information.

We are writing mixed image (BLOBs of constant size) and numerical (mixed INTEGER/REAL) information to a sqlite db table from a time critical data acquisition program (sqlite 3.33 on linux/debian, database resides on an ext4 file system). The database contains 5 (different) tables, three of which grow during the acquition process by being INSERTed INTO. The data base is opened by the spell

sqlite3_open_v2(full_path.string().c_str(), &db,SQLITE_OPEN_READWRITE|SQLITE_OPEN_CREATE|SQLITE_OPEN_NOMUTEX, nullptr)

Later,

char pragma[] = "PRAGMA synchronous = OFF";
int res = sqlite3_exec(db, pragma, NULL, NULL, &sql3_err);

is used for performance reasons. File access is from a single thread, writing typically 100 MB/s (The application is multi-threaded and writes in parallel to multiple data bases, but to each one only from a single thread). The INSERTs are issued at more or less constant intervals, bracketed between BEGIN and END TRANSACTION, with a period of 5 s.

The tables' PRIMARY key is either a single INTEGER, growing by 1 from INSERT to INSERT or a pair of INTEGERs, of which the first grows by 1 and remains constant while a second INTEGER ranging from 0 to a small count (say typically 1..5) increases.

At the end of a data acquistion run the databases have typically reached sizes of 30-100 GB. Unfortunately, our acquisition program acquired the bad habit to crash from time to time at the end of the acquisiton. At this time, data writing has slowed down to 0, the last transaction is not yet finalized, but no new INSERT takes place. In this situation the crash happens, leaving the multi-10-GB .db file and a .db-journal file behind. I am trying to recover as much data as possible from these two files. I have checked the following steps on multiple damaged data bases:

simply re-opening the database (e.g. in the sqlite3 shell) seems to replay the journal with the effect that the two largest growing tables are truncated to 0 records, the third largest still has about 1% of its expected number of records and the two initially written tables are correctly recovered. The database shrinks to approx. 500 kB in this process and the journal is deleted.
If I rename the journal and perform a dump on the remaining .db file (following the ideas/suggestions in https://stackoverflow.com/questions/18259692/how-to-recover-a-corrupt-sqlite3-database):
```
echo ".dump" \| sqlite3 "faulty.db" > sql
```
the sql shows that the two truncated tables will be created, but no data will be inserted after that. Using "PRAGMA integrity_check" returns "ok", but the database is truncated as in (1)
I have used demo versions of two affordable commercial data base recovery tools (SQLiteDoctor/SQLite Database Recovery), who arrive at mostly the same conclusion. Both large tables will be truncated to 0 records.

My questions:

is there anything obvious I could try in addition to the above in order to recover some of the data of the growing tables?
is there any procedure by which to make the database more 'resistant' or to help the recovery process [with the exception of touching the setting of synchronous = OFF?
Would you think that it is feasible or reasonable to attempt to write code to parse the database file for the remnants of the data tables (I am afraid that I will not be able to go beyond what the commercial solutions would do...)?

Thanks in advance for your help/opinions,

Stefan

(2) By Richard Hipp (drh) on 2020-09-07 20:03:59 in reply to 1 [link] [source]

With "synchronous=OFF", "writes" to the database file are not really writes, they are just moving the content into kernel space and then asking the kernel to relay the content onto disk at its convenience. If the OS crashes before that happens, the data is lost, because it never reached persistent storage. I'm not sure there is anything you can do about that, other than avoiding the use of "synchronous=OFF".

Have you tried "PRAGMA journal_mode=WAL;" with "PRAGMA synchronous=NORMAL;"? That combination of options might work better for you.

(3) By Stefan Schwarzer (anopheles63) on 2020-09-09 09:35:12 in reply to 2 [link] [source]

Thank you Richard. 

I was hoping that there were more options available given that I am dealing with an application crash, and the system obviously did write some data at least...

However, things are what they are and we will try the WAL journal option and see if performance remains sufficient. Thanks a lot again for that hint.

Best Regards, 
Stefan

(4) By Dan Kennedy (dan) on 2020-09-09 11:24:05 in reply to 3 [link] [source]

given that I am dealing with an application crash

An application crash on a working file-system should not cause you to lose any data already committed to the db, regardless of the "PRAGMA synchronous" setting. Are you sure you're losing data written before the most recent "COMMIT" on the same db handle?

In this context "working file-system" means pretty much everything except perhaps some versions of some network file-systems.

(5) By Richard Damon (RichardDamon) on 2020-09-09 11:44:44 in reply to 1 [link] [source]

It sounds like your application 'crashed' before doing a commit, which means the automatic action on opening the database is to perform a Roll-back, discarding the data.

You could perhaps modify your application to commit more often, so you lose less data in a crash.

It would be 'possible' to write code to manually parse the database and perhaps do the equivalent of a 'Commit' at this point, but it is likely better to fix teh problem and not just the symptoms. The biggest problem with forcing a commit is you don't really know (I presume) exactly where you are in the transction, and some of the data may be inconsistant.

(6) By Stefan Schwarzer (anopheles63) on 2020-09-09 22:33:24 in reply to 5 [link] [source]

Hmm, now I am getting a bit confused:

We are actually committing (END TRANSACTION) every 5 seconds. We are exclusively writing to one data base connection from a single thread without interspersed concurrent reads, so everything is as sequential and controlled as it can be. I am also quite certain that the commit is happening, as I remember that during development without the BEGIN/END TRANSACTION we have just seen growing RAM consumption and no on disk file growth.

Thus, my expectation really was that we would at most loose the last 5 seconds worth of data. The reason for my post was that we collected about 70 GB (about 500..1000 seconds) of data on disk (=final database file size) at the moment of the crash, but observed that, after the journal was rolled back, the file was truncated to about 500 kB.

The entire code is too involved to be posted, but if the behavior may indicate a bug in sqlite and anybody is interested, I could probably undertake the endeavour to produce a test case...

(7.1) By Keith Medcalf (kmedcalf) on 2020-09-09 22:52:25 edited from 7.0 in reply to 4 [source]

That is not true, Dan.

There is absolutely no way that one can be "certain" that an application write to persistent storage occurred at all UNLESS the application terminated cleanly OR the application told the Operating System and File System to "make it so" from time to time.

Some people may view this as a "feature", ill-conceived though it may be. Without detailed information there is absolutely no way to determine that the combination of "application", "filesystem" and "operating system" in use does not implement this "feature" given that "make it so" has been explicitly disabled, and the application has not exited cleanly.

In other words, "make it so" must be re-instated and see if the result is the same. If it is not, then clearly one must issue periodic "make it so" commands to the filesystem and operating system in order to bypass this "feature".

NB: "make it so" means to force the synchronization of persistent storage with the application view of that storage at the time the command to "make it so" is given.

(8) By Richard Damon (RichardDamon) on 2020-09-09 23:00:33 in reply to 6 [link] [source]

If your app is committing, but the commits aren't apperently being seen in the file, it shouldn't be because of the SYNCRONOUS = OFF, as the OS isn't crashing. It may be that whatever is crashing the program is also corrupting the database forcing the loss of data.

Doing some forensics on the database might give a hint to the sort of corruption, and might help trace the bug, or it might be a lot of work just to say that it got corrupted.

It may be easier to try to trace back to find the cause of the crash.

(9) By Larry Brasfield (LarryBrasfield) on 2020-09-09 23:52:54 in reply to 6 [link] [source]

I second Mr. Damon's idea that it may be easier to debug the application crash.

There is another, perhaps better reason to focus on that early. The crash may well be producing ill effects other than sudden termination of the process, effects which occur sooner than that bitter end. These effects can include corruption of the heap, including the one used by SQLite. They can include corruption of in-process data structures used for file I/O. They can include corruption of application data which drive the data collection and subsequent storage. Any of these ill effects could be involved in producing the symptoms you see (and apparently hope might be attributed to SQLite. [a])

[a. That attribution is highly unlikely and a waste of brain cycles at this point. ]

Your report of unbounded allocated memory growth also indicates that some code cleanup is due. Once that cleanup is done and the application address faults are vanquished, you may well find that SQLite behaves per your expectation. (You can use a process kill to simulate the crash to ascertain the 'D' part of SQLite's ACID guarantees.) If it behaves well then, and you have spent time fixing actual bugs in your code, you will be glad to have not wasted time wondering why SQLite behaved strangely in an undefined [b] execution environment.

[b. The phrase "undefined behavior" is used to describe the effect of many deviations from correct usage of the C or C++ languages. That behavior includes the kind of mysterious misbehavior you are hoping to find a remedy for here. ]

(10.1) By Dan Kennedy (dan) on 2020-09-10 06:04:46 edited from 10.0 in reply to 7.1 [link] [source]

application crash on a working file-system should not cause you to lose any data already committed

That is not true, Dan.

There is absolutely no way that one can be "certain" that an application write to persistent storage occurred at all UNLESS the application terminated cleanly OR the application told the Operating System and File System to "make it so" from time to time.

What kind of system are we talking about?

Once COMMIT has been executed, SQLite has transferred all the data to buffers managed by the OS - by calling write() or equivalent. So that the data is available to other file-system users. Are there systems that will discard these buffers if the application crashes instead of terminating cleanly?

Of course if the power fails or OS crashes before the OS buffers are actually flushed to disk you lose data, but what has that got to do with the application exiting cleanly?