SQLite Forum

SQLite 3.40.0 performance regression involving UNION after JOIN
Login

SQLite 3.40.0 performance regression involving UNION after JOIN

(1) By Mark Brand (mabrand) on 2022-12-08 16:43:03 [source]

-- Demonstration of SQLite 3.40.0 performance regression involving UNION after JOIN

DROP VIEW IF EXISTS N;
CREATE VIEW N AS
    WITH RECURSIVE
    cnt(n) AS (VALUES(1) UNION ALL SELECT n + 1 FROM cnt WHERE n < 100000)
    SELECT n FROM cnt;

DROP TABLE IF EXISTS Animal;
CREATE TABLE Animal (name TEXT PRIMARY KEY );
INSERT INTO Animal (name) SELECT 'Fido ' || n FROM N WHERE n < 10000;

DROP TABLE IF EXISTS Zone;
CREATE TABLE Zone (
    zone TEXT PRIMARY KEY,
    species TEXT
);
INSERT INTO Zone (zone, species) SELECT 'Zone ' || n, 'dog' FROM N WHERE n < 10000;

-- fast as expected
SELECT DISTINCT a.name, z.species FROM Animal a, Zone z WHERE name = 'Fido 1';

-- UNION very slow on SQLite 3.40.0
SELECT * FROM (
    SELECT a.name, z.species FROM Animal a, Zone z
    UNION SELECT NULL, NULL WHERE 0
)
WHERE name = 'Fido 1';

(2) By Keith Medcalf (kmedcalf) on 2022-12-08 17:51:44 in reply to 1 [link] [source]

This was discussed already. A subquery containing a compound now acts as an optimization boundary and outer "where" conditions are no longer pushed into the branches of the compound.

(3) By Gary (1codedebugger) on 2022-12-08 18:39:52 in reply to 2 [link] [source]

Are you referring to this question "v3.40.0 very slow with UNIONs (or so I think)"? https://sqlite.org/forum/info/282a2913aef5df3c

(5) By Richard Hipp (drh) on 2022-12-08 18:50:36 in reply to 2 [link] [source]

The fix for that at check-in adbca3448e2099f0 is cleverly defeated by Brand's use of a constant NULL column in the second SELECT of the UNION. I'm working on a better fix now....

(4) By anonymous on 2022-12-08 18:49:31 in reply to 1 [link] [source]

As Keith noted, it's been discussed here: https://sqlite.org/forum/forumpost/3824ced748baa808

The sqlite timeline here ( https://www.sqlite.org/src/timeline?r=coroutines-exp2&c=2022-12-08+16%3A47%3A16) shows they are looking at query performance, particularly  expressions similar to your example.

Note that if you look at your query plans you will likely see that in 3.40 they are doing table scans rather than using searches for those inner UNIONS.

If you really need the slowdown backed out, you can look at restriction 9 (at around line 5085 - 5087 in select.c (refer to this check in: https://sqlite.org/src/info/1ad41840c5e0fa70).  Of course this is NOT recommended as it would keep 17h, but essentially revert 9, so your sqlite behavior will be technically incorrect.  So this paragraph will (rightfully so) draw criticism of DON'T DO IT, and they will be right!  Yet you can back out the change and examine the query plan changes to confirm that is the source of the 'slow down'.  You would have to build from source, rather than amalgamation to do this hack.  Again, for testing, not recommended for production.

Once you are confident that it's the lack of push down optimisation your path forward is a bit more clear.  You can rewrite the query to manually push down the where clause (put the where on both the outer and inner union).  It will still end up doing outer loop extra work, but you may find the performance temporarily satifactory.

As you can see from the timeline, Dr Hipp and the sqlite folks are well aware of the performance issue, and are working on optimisations to mitigate the 'fix' of the long standing technical incorrect push down that was happening in restriction 9.

(6) By anonymous on 2022-12-08 18:57:35 in reply to 1 [link] [source]

Further, you will note that the restriction 9 of the push down optimisation requires that affinity be the same for both sides of the union. This means you can try to trigger the push down optimisation (and meet requirements of restriction 9) by using cast(n+1 as xxxx), however there are some issues with column affinity and views (see https://www.sqlite.org/src/info/679ed6a282a97c51).

As I noted in previous reply, Dr Hipp is working on the performance issue so a perfectly rational approach is also.....wait a bit. :)