SQLite Forum

How much would you trust a page-level checksum ?
Login
Okay, folks.  Thank you for your answers.  The most significant aspects of your answers are that the checksum algorithm, like all good checksum algorithms, is sensitive to position.  So swapping around two different octets, or the values of two different rows, changes the checksum.  And that any attempt to fool the checksum algorithm would be ridiculously difficult without reading the raw database file.

So let us suppose I trust that a 64-bit checksum (either the one already used, or a better one) is good enough for my purpose.  Now I ask the remaining question.

SQLite currently has code in which stops it from writing records which haven't changed to the file.  Suppose it checked for changes using a page-level checksum instead.  Would that be faster, and/or lead to a smaller codebase, than the current checks which are carried out at the row/field level ?  i'm prepared for answers like "Sometimes yes. sometimes no." or "Impossible to tell without writing the code, and we don't have the time.  We look forward to seeing your results.".

One sacrifice would be that you'd have to take the time to calculate a checksum for every page written, and that database files would be a little longer.  Another is that SQLite would occasionally have to write two pages instead of one, so SQLite would be **slower**. One advantage would be that since every page would have a checksum, it would be trivial to check those checksums every time a page was read, and thereby spot page-level corruption of a database file.  Another is that, under normal circumstances, a page contains data for more than one row, so fewer checks would need to be done, so SQLite would be **faster**.

I don't know how complicated and time-consuming it is for SQLite to check for pointless rewrites.  Or what proportion of pointless rewrites the code currently catches.  Thoughts welcome.