Performance problem with BLOOM filter on recursive query
(1) By Jiba (Jibalamy) on 2023-02-05 16:55:31 [source]
Hello, I encounter performance problems related to BLOOM filter in recursive queries, with SQLite version 3.40.0. I have a huge table named "objs" that has 3 columns, "s", "p" and "o" (corresponding to the subject, predicate and object members of RDF triples). Here is my query: WITH RECURSIVE transit(x) AS ( SELECT s FROM objs WHERE p=9 AND o=32805 UNION SELECT objs.s FROM objs, transit WHERE objs.p=9 AND objs.o=transit.x) SELECT x FROM transit; and the corresponding query plan : QUERY PLAN |--CO-ROUTINE transit | |--SETUP | | `--SEARCH objs USING COVERING INDEX index_objs_op (o=? AND p=?) | `--RECURSIVE STEP | |--SCAN transit | |--BLOOM FILTER ON objs (o=? AND p=?) | `--SEARCH objs USING INDEX index_objs_op (o=? AND p=?) `--SCAN transit Performance measured in CLI: Run Time: real 0.034 user 0.026844 sys 0.006627 If I disable BLOOM filter with ".testctrl optimizations 0x80000", the query plan is: QUERY PLAN |--CO-ROUTINE transit | |--SETUP | | `--SEARCH objs USING COVERING INDEX index_objs_op (o=? AND p=?) | `--RECURSIVE STEP | |--SCAN transit | `--SEARCH objs USING COVERING INDEX index_objs_op (o=? AND p=?) `--SCAN transit and the performance: Run Time: real 0.001 user 0.000503 sys 0.000116 which is about 34 times faster. Since my application runs many similar queries with different values, it has a high impact. I also noticed similar performance problems on other, more complex, queries, all of them involving RECURSIVE and BLOOM filter in the query plan. The entire database can be downloaded here: https://filesender.renater.fr/?s=download&token=196d5780-222d-4fe9-a3d0-657218ea74d4 Additional question: is there a way to disable BLOOM filter outside of the CLI (e.g. via a PRAGMA) ? Best regards, Jean-Baptiste LAMY
(2) By Richard Hipp (drh) on 2023-02-05 20:33:02 in reply to 1 [link] [source]
Should be fixed in the latest trunk check-in.
(3) By Jiba (Jibalamy) on 2023-02-07 09:04:48 in reply to 2 [link] [source]
The BLOOM filter problem is fixed in the Pre-release Snapshots sqlite-snapshot-202302052029.tar.gz. However, I now encounter a performance regression with that version (compared to 3.40.1), on the same database, when querying a view that is the union of two tables, both having a similar index. The "quads" view is defined as follows: CREATE VIEW quads AS SELECT c,s,p,o,NULL AS d FROM objs UNION ALL SELECT c,s,p,o,d FROM datas where both tables "objs" and "datas" have similar columns (data have "d" in addition), and both have an index on column "o". My query is : SELECT o,d FROM quads WHERE s=303 AND p=9; On 3.40.1, the query plan shows that indexes are used, and performances are good: Run Time: real 0.009 user 0.002006 sys 0.000000 QUERY PLAN |--CO-ROUTINE quads | `--COMPOUND QUERY | |--LEFT-MOST SUBQUERY | | `--SEARCH objs USING INDEX index_objs_sp (s=? AND p=?) | `--UNION ALL | `--SEARCH datas USING INDEX index_datas_sp (s=? AND p=?) `--SCAN quads But with the Pre-release Snapshots, indexes are not used and performances are much lower: Run Time: real 0.120 user 0.120538 sys 0.000000 QUERY PLAN |--CO-ROUTINE quads | `--COMPOUND QUERY | |--LEFT-MOST SUBQUERY | | `--SCAN objs | `--UNION ALL | `--SCAN datas `--SCAN quads