SQLite Forum

Any use case for the Bloom filter for "large analytic queries"
Login
For Q2.2, the run-time on my tests went from 46.261 seconds to 3.402.  That
was the best improvement.  2x is more typical.  Your mileage may vary.

For some table X, if the query planner sees that it is going to need to
do N searches (binary look-ups) against X and N is larger than the number
of rows in X, and if the query planner has some reason to believe that
many of those searches will end with "not found", then it goes to to the
trouble of computing the Bloom filter.  Computing the Bloom filter involves
doing an initial scan of the entire X table, so there is a significant
set-up cost.  Hence, the query planner wants to ensure that there will be
some overall benefit before incurring that cost.  If the query planner messes
that decision up, there could be a slowdown.

Bloom filters are also deployed whenever an [automatic index][1] is created,
in as much as the incremental cost of computing the Bloom filter is minimal.
(The use of an "automatic index" is what most other RDBMSes would call a
"hash join".)

I haven't yet run across a case where the query planner adds a Bloom filter
which ends up slowing things down.  But that doesn't mean no such case exists.

[1]: https://www.sqlite.org/optoverview.html#automatic_indexes