SQLite User Forum: Reporting performance issue

Thanks for posting these. Very interesting indeed.

All of them seem to be cases where the planner should be able to determine that
no work (or very little) is required. I guess that's a product of your 1000x
threshold - it's hard to be 1000x slower unless you miss a really good
shortcut! I think instances where one db is consistently more than (say) 10x
slower would be quite interesting as well.

Anyway, for the record these all seem to come down to two things:

(1) SQLite is not taking advantage of NOT NULL constraints when processing IS NULL. With SQLite, if you do:

        CREATE TABLE t1(a NOT NULL);
        SELECT * FROM t1 WHERE a IS NULL;

SQLite does a full-table scan. But it should be possible to determine at compile-time that any "col IS NULL" expression for which col has a NOT NULL constraint and does not come from a table on the right-hand-side of a LEFT JOIN is always false.

(2) SQLite is not ignoring columns in the result-set of an EXISTS(SELECT ...) when determining whether or not the SELECT is correlated. i.e. if you do:

        CREATE TABLE t1(a, b);
        CREATE TABLE t2(x, y);
        SELECT * FROM t1 WHERE EXISTS (SELECT a FROM t2 WHERE x=1);

then SQLite runs the sub-query once for every row of t1 because it thinks the "a" in the sub-query makes it correlated. It doesn't - the sub-query only has to be run once.

Naive changes to support the two optimizations above fix most of these examples. But there are a couple of tricks. For example, query 1753 features the equivalent of:

        CREATE TABLE t1(a, b);
        CREATE TABLE t2(x, y NOT NULL);
        SELECT * FROM t1 LEFT JOIN t2 WHERE t1.a=t2.x AND y IS NULL;

In the query, the LEFT JOIN can be transformed to a regular join (because t1.a=t2.x implies that t2.x cannot be NULL) which allows the database to determine that "y IS NULL" will always be false. But currently, by the time SQLite realizes it can transform the join it's tricky to tell exactly which LEFT JOIN (there may be many in the query) column "y" came from.

There's a similar problem in 1948:

        CREATE TABLE t1(a, b);
        CREATE TABLE t2(x, y NOT NULL);
        CREATE TABLE t3(c, d);
        SELECT * FROM t1 LEFT JOIN t2 WHERE t1.a=t2.x EXISTS (SELECT 1 FROM t3 WHERE t2.y IS NULL OR t3.c=10);

By the time the LEFT JOIN is transformed to a regular join, the sub-select has already been marked as correlated. So even though the "t2.y IS NULL" is transformed to "false", making the sub-select uncorrelated, SQLite still runs it once for every row visited by the query.

Apparently, other database engines do better in these cases.

I'm not sure how much of a rush we'll be in to fix these specific cases. Something for after 3.35.0 anyway.

Thanks,

Dan.