SQLite Forum

How much would you trust a page-level checksum ?
Login
Bill, I have long had a policy against getting into calendar arguments or arithmetic arguments. I was tempted to add statistics and probability arguments to that list and ignore your (mistaken) contention.

But that square-root assertion just sticks in my craw, and using it or the Birthday Paradox to show me wrong is just too much. So, here is some SQL to calculate the correct<sup>1</sup> odds of no shared birthdays among 23 people (ignoring the complexity of some people being born on February 29.):<code>
   select exp(sum(ln_odds_against)) as group_odds_against from (
     with recursive SharedBirthdayShouters(bfn)
      as (values(1) union all select bfn+1
      from SharedBirthdayShouters
      where bfn <= 23)
     select ln((1.0 - (bfn-1)/365.0)) as ln_odds_against
     from SharedBirthdayShouters
   );
</code>. You should notice that there are no square roots anywhere in that calculation. And the significance of "23^2 has similar magnitude as 365" completely escapes me.

There is a much more fundamental problem with your dismal view of checksum failure odds (and contradiction of my assertion on that.) The Birthday Paradox simply does not apply. If we imagine all X billion people somehow able to discover that they got a same random checksum as anybody else did as they collected some number of them, then you could use the same arithmetic as explains that pseudo-paradox to predict the odds. But that is not the arithmetic that predicts how often any one of X billion people will see a checksum given once to each of them at the start of that 74 years, each looking for a repeat only in their own series. The birthday collision is about matching among a set of limited range random numbers, not about matching a single one.

If you want to prove yourself right about the chances of the same checksum occurring among a fixed number of randomly created ones, just plug power(2,64) in place of 23 in that query. But you should not claim that answer contradicts my much simpler (and faster) probability calculation.

----

1. If 23 students are asked to speak their birthdays in turn, and all are effectively compelled to shout "That's mine too!" if they hear their birthday spoken (by another student), what is the probability that none of them will so shout?

The the probability of for that shout not happening as the 1st one speaks is: (1 - 0/365)
The the probability of that shout not happening as the 2nd one speaks is: (1 - 1/365)
The the probability of that shout not happening as the Nth one speaks is: (1 - (N-1)/365)

The probability that the whole group will finish speaking without so shouting is the product of the probabilities at each step.

Since there is no built-in aggregate product function, I simulate it by taking the natural log at each step, summing those over the ensemble, then taking the inverse natural log of the sum.

For the 23 students, the probability of no shouts is about 0.46, as anybody can see using a recent version SQLite shell.