SQLite Forum

LEFT JOIN performance regression in 3.41.1
Login

LEFT JOIN performance regression in 3.41.1

(1) By Mark Brand (mabrand) on 2023-03-11 18:26:40 [source]

I note a huge loss in peformance in 3.41.1 for some LEFT JOINS. I suspect the index on the right side of the join is not being used. Below is a minimal case showing this effect.

regards,
Mark

-- 3410000 : 0m0,025s
-- 3410100 : 0m10,590s

CREATE TEMP VIEW G AS
    WITH RECURSIVE
    cnt(n) AS (VALUES(1) UNION ALL SELECT n + 1 FROM cnt WHERE n < 10000)
    SELECT n FROM cnt;

CREATE TEMP TABLE T (pk INT PRIMARY KEY);

INSERT INTO T (pk) SELECT n FROM G;

CREATE TEMP VIEW V AS SELECT * FROM T;

SELECT *
FROM T
LEFT JOIN V ON V.pk = T.pk;

(2) By Richard Hipp (drh) on 2023-03-11 21:20:38 in reply to 1 [link] [source]

Bisect says that the problem was introduced by check-in 198b3e33dcfd74c7 which fixed a bug reported by forum thread 26387ea7ef.

I'll look into this. In the meantime, I suggest you continue using version 3.41.0 for your particular application.

(3) By Mark Brand (mabrand) on 2023-03-11 22:10:02 in reply to 2 [link] [source]

Thanks. I played around with this to come up with a slightly more minimal case, without the self-join. The critical factor seems to be wrapping the right side of the left join in a view. Also found the effect goes away with SELECT DISTINCT.


CREATE TEMP VIEW G AS
WITH RECURSIVE
    cnt(n) AS (VALUES(1) UNION ALL SELECT n + 1 FROM cnt WHERE n < 10000)
    SELECT n FROM cnt;

CREATE TEMP TABLE T (pk INT PRIMARY KEY);

INSERT INTO T (pk) SELECT n FROM G;

CREATE TEMP VIEW V AS SELECT * FROM T;

SELECT * FROM G LEFT JOIN V ON V.pk = 0;

(4) By Richard Hipp (drh) on 2023-03-11 22:54:27 in reply to 3 [link] [source]

This is the test script that I am using:

.echo on
.mode qbox
CREATE TABLE t1(a INTEGER PRIMARY KEY);
WITH RECURSIVE c(n) AS (VALUES(1) UNION ALL SELECT n+1 FROM c WHERE n<100)
  INSERT INTO t1(a) SELECT n FROM c;
CREATE VIEW t2(b) AS SELECT a FROM t1;
.eqp on
.stat vmstep
SELECT * FROM t1 LEFT JOIN t2 ON a=b LIMIT 5 OFFSET 95;

The ".eqp on" line causes the SELECT to show its query plan. And the ".stat vmstep" line causes a count of the number of byte-code engine steps needed to complete the query. On 3.41.1, the query plan does not use an index and requires 51,445 steps. On 3.41.0 and earlier, and index is used and requires 729 steps.

FWIW, the LIMIT-OFFSET clause is just to prevent the output from taking up too much vertical space on my screen when I run the script.

(5.2) By Keith Medcalf (kmedcalf) on 2023-03-11 23:19:17 edited from 5.1 in reply to 4 [link] [source]

You could just replace the * with count(*) then you do not need limit nor offset. (The optimizer does not know that the answer will always be the number of rows in the left table of the left join (or right table of a right join), so performs the join anyway).

(6) By Richard Hipp (drh) on 2023-03-11 23:30:58 in reply to 1 [link] [source]

I believe the performance issue is now resolved, both on trunk and on branch-3.41.

(7) By Mark Brand (mabrand) on 2023-03-11 23:57:17 in reply to 6 [link] [source]

I applied your patch to the sqlite-autoconf-3410100 tarball sources, and it works for me. Amazing work for a Saturday evening! Can I send you a pizza or something?