Is it possible to fix a more than 1G corrupted SQLite file with only the first 16K content corrupted?

(1) By anonymous on 2022-01-20 02:41:17 [link] [source]

Hello,

We have a 1GB+ SQLite file with the first 16K content corrupted. I am wondering, is it possible to fix/repair that file completely? Many thanks in advance.

Lirong

(2) By Larry Brasfield (larrybr) on 2022-01-20 03:15:54 in reply to 1 [link] [source]

It is conceivable but likelihood depends a lot on the nature of the corruption.

Given that the DB schema is stored in the first 16 Kbytes, it may well have been clobbered, much diminishing chances of a useful recovery. What does pragma_integrity_check have to say?

Have you tried running .recover on a copy of your corrupted DB?

Do you have any way of ascertaining whether the result of a .recover at least contains your data, perhaps only missing indexes (which can be readily rebuilt)?

I urge you to consider whether "completely" is the only degree of recovery useful to you. You should be aware that it may not be possible to tell whether all the actual data was recovered.

What is certain is that there is not enough redundancy in a SQLite DB file to permit arbitrary alteration of the first 16 Kbytes without making it uncoverable for most such alterations. So a lot depends on how corrupted your DB is.

(3.1) By ddevienne on 2022-01-20 07:48:59 edited from 3.0 in reply to 1 [link] [source]

In addition to what Larry wrote, what was the page_size of that DB?

If it was 16K, and you are sure only the first 16K were clobbered,
then you might have a better chance of recovery, if you know what
the schema of that DB is supposed to be.

The 1st page of an SQLite DB contains both the 100 bytes header,
and the content of the all-important sqlite_main (aka sqlite_master)
table. If large enough, sqlite_main will spill to more pages,
but most smallish schemas fit in a single page.

So recreate an empty DB with the same schema, with the same 16K page_size,
and overlay that 1st 16K page of the empty DB on top of the corrupted DB,
(on a copy of course, as Larry mentioned), and see what happens.

If your DB uses sequences, you might have to fix those, but I think they
are in a separate table, thus separate page.

If OTOH your DB uses a 4K page_size, then potentially more tables than just
sqlite_main are corrupted, which might contain data and/or a non-leaf page
of the B-Tree, and thus things might be more difficult to recover your DB...

I suggest you carefully read the file-format doc, in any case.

There's also at least one person of this list that does commercial DB
recovery using specialized tools, which might be able to help, if that
DB really has a lot of value.

I hope that's helpful. Good luck, --DD

(5) By anonymous on 2022-01-20 16:55:35 in reply to 3.1 [link] [source]

My DB uses a 4K page_size.

Yes, of course, I am looking for commercial DB recovery.

(6) By ddevienne on 2022-01-20 17:46:57 in reply to 5 [link] [source]

Bummer.

How can you be sure only the first 16K are corrupted?

Can you share the schema of the DB?
Along with a hex-dump (e.g. od or xxd) of those first 16K?
(possibly with a way to fetch that 16KB file from some URL).

I'm guessing you can't, but in case you do, might be an interesting exercise on this list.
You never know what could happen, lots of technical folks here who like challenges :)

(4) By Harald Hanche-Olsen (hanche) on 2022-01-20 09:07:10 in reply to 1 [link] [source]

Yet another idea: This is assuming that you have an old backup. Presumably, you don't want to use that, since it will contain outdated data. But you might try using the first 16K from it! With luck, it won't have changed.

(7) By Simon Slavin (slavin) on 2022-01-21 13:06:03 in reply to 4 [link] [source]

This was my idea too. Working only on copies of the files, to prevent making the originals worse …

Find an older version of the database which has become corrupted.
Copy the first 16K of it.
Overwrite the first 16K of the corrupted database with that copy.
Open the resulting file in the SQLite command-line shell. Does it open ?
If so, run an integrity_check on it. Does it pass ?
Use the .recover command on it. Do you get anything useful back ?

(8) By David Raymond (dvdraymond) on 2022-01-21 15:46:00 in reply to 1 [source]

I'm curious how you know only the first 16k was corrupted.

The schema is always on the first page (or at least starts there), and the next 3 pages could in theory be anything.

If those pages are garbage, you can still recover all the individual records that are completely stored on pages other than the first 4 there. Any data in those first 4 pages though is of course gone.

Because the schema is corrupted, you won't necessarily know what table each record belongs to. But you can put them in big groups and say look,
we know everything in group A was in the same table,
we know everything in group B was in the same table,
we know everything in group C was in the same table.
But we don't know which tables A, B and C were, and they might even be the same one.

If you know what your schema looked like, and all of your tables had different numbers of fields, then it's nice and easy to say ohh, these records were 12 fields long, they must have been from table X.

But if you had multiple tables with the same number of fields it can be harder to determine what was what. Maybe you yourself can figure it out based on the contents and what type of data was in each field, maybe not.
Maybe you can look at the rowid's in groups A and B and see that they overlap, and thus be sure they're from different tables, or maybe they won't overlap and you won't know.

Etc.

(9) By ddevienne on 2022-01-21 16:22:14 in reply to 8 [link] [source]

the next 3 pages could in theory be anything

Note quite, excluding a vacuum or autovacuum.

(and even then, the vacuum is likely to follow a deterministic pattern for low page IDs I suspect).

Because each CREATE DLL will allocate at least one page per table and index(es),
so page#2 is likely to be the 1st table created, page#3 the 2nd table, unless the 1st table had an index.
At least that's my experience.