Possible reasons for a select query running 100x faster using the CLI client than using the C library?

(1) By CMDR Vulkarius (CMDR_Vulkarius) on 2025-01-10 01:58:28 [source]

I have a query with a fair bit of nesting and grouping, but ultimately only needs ~150 ms total to return ~100 rows using the sqlite3 command line client.
If I run the same exact SQL with the C library using sqlite3_prepare_v2 and sqlite3_step, it needs ~100 ms per row instead, ~10000 ms total.
The calls to sqlite3_step are blocking per row iterated when running from my program, compiled with GCC on Linux.
I also tested compiling the CLI client myself - that then runs just as slowly, i.e. the distro package sqlite3 somehow does it unaccountably more quickly!
That is from the Ubuntu package source, where my PC using Linux Mint and a server using Ubuntu Server both have the same symptom, slow API but fast CLI.

Query itself looks like this...

select dat.id, dat.k, dat.name, sum(c.ww) weight, sum(c.ww*n.c_norm) norm
from (select c_id, w_ids, 1.0/json_array_length(w_ids) ww from clist where w_ids!='[]') c
join (select c_id, 1.0/count() c_norm from clist group by c_id) n on n.c_id=c.c_id
left join json_each(c.w_ids) w
join dat on w.value=dat.id
group by dat.id

Just in case the schema is involved, it's this (I know some types are non-SQLite, just showing the result of running .schema)...

CREATE TABLE s (id integer primary key,k varchar(64) not null unique,name varchar(64) not null);
CREATE TABLE r(id tinyint not null unique,name varchar(32) not null,primary key (id));
CREATE TABLE clist(t bigint not null,vs tinyint not null,s_id integer not null,r_id tinyint not null,ws tinyint not null,b_one integer not null,b_all integer not null, u_id integer default -1, u_s_id integer default -1, v_id integer default -1, j_id integer default -1, w_ids text default '[]',constraint k_s foreign key (s_id) references s (id),constraint k_r foreign key (r_id) references r (id),constraint b_one_valid check(b_one > 0),constraint b_all_valid check(b_all >= b_one));
CREATE TABLE u (id integer primary key, name varchar(32) not null unique);
CREATE TABLE v(id integer primary key, k varchar(64) not null unique, name varchar(64));
CREATE TABLE j(id integer primary key, k varchar(64) not null unique, name varchar(64));
CREATE TABLE dat(id integer primary key, k varchar(64) not null unique, name varchar(64));

The actual query logic is exactly as I want it, grouped results with normalisation factors attached.
Between groups/aggregates, sub queries, joins and JSON functions I'm guessing the groups can cripple it somehow.
For some numbers it's ~100000 stored rows getting grouped down to ~100 result rows.

I've tried different SQLite versions, various different C compiler flags and SQLite compile options.
Most tweaks seem to be for haggling over a few percent, nowhere near the order of magnitude difference I'm seeing.
It's so much that I could actually write my program to run the sqlite3 client as a sub process and parse the output to get the results 100x faster.
I really don't want that to be the solution though!

It feels like it's recompiling the query statement on each call to sqlite3_step.
The documentation for sqlite3_prepare_v2 says that can happen, but only if the schema changed (it didn't) or a WHERE parameter changed (no parameters there).
Creating temporary views for those sub queries also doesn't change anything.

Either the Ubuntu package maintainers are using some clever compile options, or they've patched SQLite with an absolutely massive performance boost!

(2) By SeverKetor on 2025-01-10 04:38:38 in reply to 1 [link] [source]

The community crystal ball was lost in an unforeseen accident; you'll have to show us your C code. Presumably whatever mistake is causing the slow down in the code will be more readily apparent when the code is visible.

(3) By Mike Castle (nexushoratio) on 2025-01-10 06:29:24 in reply to 2 [link] [source]

Well, the OP did state this happens with the sqlite3 command line tool as well, when built locally. So I presume that code is probably available.

(8) By CMDR Vulkarius (CMDR_Vulkarius) on 2025-01-10 10:41:45 in reply to 2 [link] [source]

It's about as simple as it gets - with the file path and SQL string assigned earlier:

std::string db_path;
std::string query;

It does this:

int err;
sqlite3* db;
err = sqlite3_open_v2(db_path.c_str(), &db, SQLITE_OPEN_READONLY, nullptr);
if (err != SQLITE_OK) {std::cerr << sqlite3_errmsg(db) << '\n'; return 1;}

sqlite3_stmt *stmt;
err = sqlite3_prepare_v2(db, query.c_str(), -1, &stmt, nullptr);
if (err != SQLITE_OK) {std::cerr << sqlite3_errmsg(db) << '\n'; return 1;}

while ((err = sqlite3_step(stmt)) == SQLITE_ROW)
{
	std::cout << "row\n";
}
if (err != SQLITE_DONE) {std::cerr << sqlite3_errmsg(db) << '\n'; return 1;}

sqlite3_finalize(stmt);
sqlite3_close(db);

Fetching the actual column data has no effect on the time taken, i.e. it's all in sqlite3_step.
The column contents are correct if fetched, just not needed for causing the CPU hit per row.

(4.1) By Domingo (mingodad) on 2025-01-10 09:30:34 edited from 4.0 in reply to 1 [link] [source]

If for some reason there is a "clever compile options" in your distribution sqlite3 executable then probably it's there on the shared sqlite3 library provided by then and if you are using it and getting a difference in performance then as SeverKetor pointed out without seem your C code it's hard to say something useful.

(5) By Nuno Cruces (ncruces) on 2025-01-10 09:38:28 in reply to 1 [link] [source]

Compare the query plans between the two versions, and the compile options.

I'm guessing something like SQLITE_ENABLE_STAT4 leading to a better query plan in the CLI, but it's just a guess.

(7) By CMDR Vulkarius (CMDR_Vulkarius) on 2025-01-10 10:21:27 in reply to 5 [link] [source]

Thank you for those pointers - many differences there, with both the plan output and the options output!
I'll try some experiments with recompiling my version to get them matching as closely as possible.
(Given the version difference, that is - I looked up a few of them and I can see that e.g. ENABLE_JSON1 is now implicit.)

In the meantime, the query plans (newer version is the one acting slowly)...

SQLite version 3.47.0
QUERY PLAN
|--CO-ROUTINE n
|  |--SCAN clist
|  `--USE TEMP B-TREE FOR GROUP BY
|--SCAN dat
|--SCAN n
|--BLOOM FILTER ON clist (c_id=?)
|--SEARCH clist USING AUTOMATIC PARTIAL COVERING INDEX (c_id=?)
`--SCAN w VIRTUAL TABLE INDEX 1: LEFT-JOIN

SQLite version 3.31.1
QUERY PLAN
|--MATERIALIZE 2
|  |--SCAN TABLE clist
|  `--USE TEMP B-TREE FOR GROUP BY
|--SCAN SUBQUERY 2 AS n
|--SEARCH TABLE clist USING AUTOMATIC COVERING INDEX (c_id=?)
|--SCAN TABLE json_each AS w VIRTUAL TABLE INDEX 1:
|--SEARCH TABLE dat USING INTEGER PRIMARY KEY (rowid=?)
`--USE TEMP B-TREE FOR GROUP BY

The compile options...

SQLite version 3.47.0
ATOMIC_INTRINSICS=1
COMPILER=gcc-9.4.0
DEFAULT_AUTOVACUUM
DEFAULT_CACHE_SIZE=-2000
DEFAULT_FILE_FORMAT=4
DEFAULT_JOURNAL_SIZE_LIMIT=-1
DEFAULT_MMAP_SIZE=0
DEFAULT_PAGE_SIZE=4096
DEFAULT_PCACHE_INITSZ=20
DEFAULT_RECURSIVE_TRIGGERS
DEFAULT_SECTOR_SIZE=4096
DEFAULT_SYNCHRONOUS=2
DEFAULT_WAL_AUTOCHECKPOINT=1000
DEFAULT_WAL_SYNCHRONOUS=2
DEFAULT_WORKER_THREADS=0
DIRECT_OVERFLOW_READ
HAVE_ISNAN
MALLOC_SOFT_LIMIT=1024
MAX_ATTACHED=10
MAX_COLUMN=2000
MAX_COMPOUND_SELECT=500
MAX_DEFAULT_PAGE_SIZE=8192
MAX_EXPR_DEPTH=1000
MAX_FUNCTION_ARG=127
MAX_LENGTH=1000000000
MAX_LIKE_PATTERN_LENGTH=50000
MAX_MMAP_SIZE=0x7fff0000
MAX_PAGE_COUNT=0xfffffffe
MAX_PAGE_SIZE=65536
MAX_SQL_LENGTH=1000000000
MAX_TRIGGER_DEPTH=1000
MAX_VARIABLE_NUMBER=32766
MAX_VDBE_OP=250000000
MAX_WORKER_THREADS=0
MUTEX_OMIT
SYSTEM_MALLOC
TEMP_STORE=1
THREADSAFE=0

SQLite version 3.31.1
COMPILER=gcc-9.4.0
ENABLE_COLUMN_METADATA
ENABLE_DBSTAT_VTAB
ENABLE_FTS3
ENABLE_FTS3_PARENTHESIS
ENABLE_FTS3_TOKENIZER
ENABLE_FTS4
ENABLE_FTS5
ENABLE_JSON1
ENABLE_LOAD_EXTENSION
ENABLE_PREUPDATE_HOOK
ENABLE_RTREE
ENABLE_SESSION
ENABLE_STMTVTAB
ENABLE_UNKNOWN_SQL_FUNCTION
ENABLE_UNLOCK_NOTIFY
ENABLE_UPDATE_DELETE_LIMIT
HAVE_ISNAN
LIKE_DOESNT_MATCH_BLOBS
MAX_SCHEMA_RETRY=25
MAX_VARIABLE_NUMBER=250000
OMIT_LOOKASIDE
SECURE_DELETE
SOUNDEX
THREADSAFE=1
USE_URI

(9) By ddevienne on 2025-01-10 10:45:58 in reply to 7 [link] [source]

I see a MATERIALIZE in the fast plan.

Recent SQLite versions have changed a bit in this regard, doing it less implicitly I think (I'm no expert),
and you can now use an explicit MATERIALIZE keyword in your queries instead, it could be linked.

There's also a BLOOM filter in the new one, which can also cause problems.
There's a CLI-specific way to disable Bloom filter, to see if it's the culprit.

(10) By Nuno Cruces (ncruces) on 2025-01-10 10:46:37 in reply to 7 [link] [source]

MATERIALIZE jumps out as relevant. Maybe use a MATERIALIZED CTE to get a more similar query plan.

The change is probably simply due to the SQLite version. I'd try the exact same version to confirm. If so, maybe the SQLite developers want to consider the regression.

(12) By CMDR Vulkarius (CMDR_Vulkarius) on 2025-01-10 13:08:56 in reply to 10 [link] [source]

Thanks again - the query plan choice seems to be all about the SQLite version!

Moving the 2nd sub-query into a CTE and using materialize indeed makes that part of the query plan match (including the sub-query B-TREE FOR GROUP), without affecting the speed though.
The main differences remaining are the "partial" covering index, nothing to replace SEARCH TABLE dat USING INTEGER PRIMARY KEY, and no B-TREE FOR GROUP on the main query.

I also tried fiddling with the bloom filter (I found .testctrl optimizations 0x80000 in another thread) - that just removes the bloom filter line, again no effect.
(It seems the bloom filter won't actually make a plan worse for having it, i.e. the problem is only in the choice of plan.)

I then tried it using the packaged version of the sqlite3 CLI on a new version of the OS (Linux Mint 22.0 vs 20.1, whichever Ubuntu package versions those use).
That came with SQLite 3.45.1, and it's choosing the slower query plan.
The version on my server app is 3.42.0, again the slower query plan.

I then tried downloading the SQLite source for 3.31.1 to match my older OS package - https://www.sqlite.org/src/info/3bfa9cc97da10598
Compiling that with all previous options removed and using only SQLITE_ENABLE_JSON1 (deprecated since 3.38 but needed here), my program now completes quickly.
Compiling the older shell, that also runs quickly and shows a matching plan for the original query.

I think from here I can probably deploy with a previous version, after making 100% sure all queries get prepared at startup to make it fail-fast if a query needs amending.
It'll be a bit arduous to bisect the version difference and repeat the speed test, although I'll do so once or twice as I'd prefer to limit how far the deployed version gets downgraded.

If it would help to test a possible regression, I can also try preparing an example database file with scrubbed data.
That'll take some time, but it'll be worth it if it improves future SQLite releases!

(13) By CMDR Vulkarius (CMDR_Vulkarius) on 2025-01-10 16:03:22 in reply to 10 [link] [source]

Update on the versions - I've found that 3.32.3 works quickly and 3.33.0 chooses a slow query plan.
These were compiled from the release revisions for those (consecutive) versions.

3.32.3 revision - https://www.sqlite.org/src/info/7ebdfa80be8e8e73
3.33.0 revision - https://www.sqlite.org/src/info/fca8dc8b578f215a

Older/faster plan...

SQLite version 3.32.3 2020-06-18 14:00:33
QUERY PLAN
|--MATERIALIZE 2
|  |--SCAN TABLE clist
|  `--USE TEMP B-TREE FOR GROUP BY
|--SCAN SUBQUERY 2 AS n
|--SEARCH TABLE clist USING AUTOMATIC COVERING INDEX (c_id=?)
|--SCAN TABLE json_each AS w VIRTUAL TABLE INDEX 1:
|--SEARCH TABLE dat USING INTEGER PRIMARY KEY (rowid=?)
`--USE TEMP B-TREE FOR GROUP BY

Newer/slower plan...

SQLite version 3.33.0 2020-08-14 13:23:32
QUERY PLAN
|--MATERIALIZE 2
|  |--SCAN TABLE clist
|  `--USE TEMP B-TREE FOR GROUP BY
|--SCAN TABLE dat
|--SCAN SUBQUERY 2 AS n
|--SEARCH TABLE clist USING AUTOMATIC COVERING INDEX (c_id=?)
`--SCAN TABLE json_each AS w VIRTUAL TABLE INDEX 1:

So the MATERIALIZE difference came later, as did the bloom filter, neither being the cause.
The active differences there are the lack of using the rowid index and the lack of B-tree on the main query grouping.

As mentioned, if there's interest in reviewing the query planner then I can make an example database file which produces that plan difference, if that'll help.
In the meantime I've ported existing usage here not to use insert/update queries with RETURNING (unavailable before 3.35.0).
It still needs a big sweep to be sure there aren't any inline statements being created though - or some other way to get the query planner to make the faster choice!

(14) By Nuno Cruces (ncruces) on 2025-01-10 17:39:59 in reply to 13 [link] [source]

Beyond waiting for SQLite developers, I'd check if, in the newer releases, running a PRAGMA optimize (and compiling with SQLITE_ENABLE_STAT4) helps.

(15.1) By CMDR Vulkarius (CMDR_Vulkarius) on 2025-01-10 23:25:13 edited from 15.0 in reply to 14 [link] [source]

STAT4 is indeed enabled, and the analysis features definitely aren't changing the query plan.
In finding the documentation pages related to suggestions here and following the numerous links, I found something which actually has cured the query plan though!
The STAT4 topic lead me to the documents about the query planner and its stability, ultimately to this part - https://sqlite.org/queryplanner-ng.html#howtofix

From that list I've tried fiddling with indexes and running ANALYZE, then reading beyond the note to stop reading ideally, there's an excellent point about CROSS JOIN.
Everything here rests upon the join order and the SQLite nested iteration, where that query has a logical order for a human thought process, but the planner is free to reorder.
Looking back at the good versus bad query plans, newer SQLite is demoting the json_each to last place - the one thing I can't (knowingly at present) index.
It's difficult to spot because newer SQLite (~3.40) removes mention of json_each from the query plan report, leaving only the dummy table name.
Changing just that final JOIN into CROSS JOIN, in making the json_each table result not-last, has cured it on SQLite versions from 3.33.0 to 3.47.0.

Adding that CROSS JOIN has:

Retained CO-ROUTINE rather than forcing MATERIALIZE (good - guessing that means lower memory footprint to run the query).
Retained the bloom filter.
Changed the dat table from SCAN to SEARCH and placed it last.
Restored the top-level B-tree for grouping.

I'm fairly sure now that its indexing woes are the use of json_each in the middle of the join, and the newer planners promoting the indexed table to its right above its place in the query.
Probably that's the trouble with making the json_each call become the most-nested iteration layer, it then gets called thousands of times with the same bit of JSON markup.
In practice it only differs due to columns to the left of its position in the query.
Actually I'd conjecture that all query plans ought always place table-valued functions at the soonest possible position to decide their result, dependencies to the left and non-essentials to the right.

In any case, huge thanks for the various pointers there, it's all good to use newer versions of SQLite now!
Having seen it work fine in past versions and used a below-the-stop-line fix, I'm guessing that makes it a genuine regression to be considered.
Likely not the most important of conceivable to-do items as it's not the usual everyday query, but I'm happy to help with a test case if it attracts development interest.

(6.1) By ddevienne on 2025-01-10 11:34:18 edited from 6.0 in reply to 1 [link] [source]

Nitpick, but your query references clist.c_id yet your table def does not have that column.
Since you join on that column, I wanted to see if it was indexed. Thus I'm confused...

As Nuno wrote, compare the query plans, and post them.
Perhaps it's linked to _STAT1 vs _STAT4, as Nuno suspects.
You could also try analyze, to see if the plans changes for either your compiled version, or the official CLI.

In any case, I'm interested in the outcome of your investigation.

(11) By CMDR Vulkarius (CMDR_Vulkarius) on 2025-01-10 11:02:40 in reply to 6.0 [link] [source]

I messed up sanitising the names - the u_id in the clist schema is that c_id.
In theory it references a foreign key, in practice I didn't want to enforce that because the group of entries with the same c_id is still valid even if the foreign entry is deleted.
A good analogy is resource access stats which are stored per-user but reported anonymously and still valid if the user ever wants all P.I.I. deleted.

Analyze doesn't change either query plan - it's looking like a case of fiddling with compile options.
I didn't know about "pragma compile_options" before, so I'm hoping I can replicate the package client now that I'm armed with a checklist!