SQLite Forum

How much would you trust a page-level checksum ?
Login
Is computing one checksum, and comparing it to another faster than just comparing two sequences? That depends on how fast the checksum algorithm is, and the relative speeds of computation and sequential memory access. I'm sure people can find systems where either one wins.

Even if you get around that issue, I believe a 64-bit checksum is too small.

Birthday paradox says (roughly) that given random numbers in 0 ... N, you should expect duplicates to show up when you have roughly sqrt(N) numbers.

For the smallest sqlite page size, that "roughly" shows up at a couple of terabytes. That is too small for comfort, and one reason UUID's aren't only 64 bits.

I think you'd want to use a good 128 bit checksum, instead.

As to whether it is better to check for changes as you update a row or field, or to just wait until commit time and check the entire page my guess is:

1) Doing checks at the page level is simpler to code, if your only concern is to avoid writing unchanged pages. If you also want to avoid updating indexes that refer only to unchanged fields (I don't know if sqlite does that optimization), I'd guess that waiting to see if the page is dirty before updating indexes makes things trickier.

2) I suspect that page-level checks (with or without a checksum) are a performance boost only if it is very common for a page to get transactions where a sequence of update/delete/insert results in no net change to the page.

I do see a value in checksums, just to catch accidents or bugs. I just don't believe they are likely to be much of a speedup for learning whether or not a bunch of small changes resulted in any net change.