# Full bloom filter blob capacity is never used?

### (1) By J. Schaefer (jschaefer) on 2023-06-11 21:34:45 [source]

Hello,

I am currently trying to understand how the bloom filter hashing works in sqlite (latest trunk).

In *where.c* an `OP_Blob`

is set up with at least 80K bits of space for the bloom filter:
```
sz = sqlite3LogEstToInt(pTab->nRowLogEst);
if( sz<10000 ){
sz = 10000;
// ...
sqlite3VdbeAddOp2(v, OP_Blob, (int)sz, pLevel->regFilter);
```

So there is now a Blob with a size of at least 10000 bytes (80K bit).

Then in *vdbe.c* in `case OP_FilterAdd`

the filter is filled.
Here `h`

is the result of a hashing function and now a single 1 needs to be set in the filter:

```
h = filterHash(aMem, pOp);
// ...
h %= pIn1->n;
pIn1->z[h/8] |= 1<<(h&7);
```

I think `pIn1->z`

is the filter-blob and `pIn1->n`

is the size of the filter blob.
In case of e.g. the filter having the minimal size of 80K bits `pIn1->n`

is 10000 bytes.
Thus after the modulo operation `h`

can only be at max 9999.
Then `pIn1->z[h/8]`

indicates that only bytes 0 to 1249 (==9999/8) can ever be addressed, and bytes 1250 to 9999 will always stay zeroed.

The same is of course happening `case OP_Filter`

.

If I am understanding this correctly this means for variable sized filters that 87.5% of the reserved filter space can never be used, thus raising the probability of false positives from ideally 11.75% to 63.2%?

Or have I overlooked something?

Thanks and best regards

J. Schaefer

### (2.1) Originally by Dan Kennedy (dan) with edits by Larry Brasfield (larrybr) on 2023-06-12 15:42:45 from 2.0 in reply to 1 [link] [source]

You're quite right of course. Thanks for reporting this. Now fixed here.

Dan.

### (3) By Spindrift (spindrift) on 2023-06-12 15:14:47 in reply to 2.0 [link] [source]

Hi Dan - I think you've accidentally linked back to this forum post rather than the check-in.