SQLite Forum

bad plan for a query with an OR condition
Login

bad plan for a query with an OR condition

(1) By Juraj Ivančić (jivancic) on 2020-12-21 13:29:26 [link] [source]

Hi all,

I've run into a surprisingly slow query and it seems that sqlite query planner doesn't figure out what seems to be a simple optimization.

Below is a use case where

SELECT x FROM t WHERE c1 OR c2

is incredibly slow, where equivalent, but much less clear

SELECT x FROM t WHERE c1 UNION ALL SELECT x FROM t WHERE c2

is lightning fast.

Here's the sqlite console output:

# sqlite3
SQLite version 3.34.0 2020-12-01 16:14:00
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
sqlite> create table file(id integer primary key autoincrement, name TEXT);
sqlite> insert into file(id) select value from generate_series(1, 10 * 1000 * 1000);
sqlite> create index file_index on file ("name");
sqlite> create table event(id integer primary key autoincrement, file_id integer, new_name TEXT);
sqlite> .timer on
sqlite> .eqp on
sqlite> select id from file where name = 'aaa.txt';
QUERY PLAN
`--SEARCH TABLE file USING COVERING INDEX file_index (name=?)
Run Time: real 0.001 user 0.000000 sys 0.000000

Here comes the slow query (note that the event table is empty):

sqlite> select file.id from file left join event on file.id = event.file_id where coalesce(event.new_name, file.name) = 'aaa.txt';
QUERY PLAN
|--SCAN TABLE file
`--SEARCH TABLE event USING AUTOMATIC COVERING INDEX (file_id=?)
Run Time: real 1.589 user 1.593750 sys 0.000000

Equivalent query, that avoids using coalesce, is also slow.


sqlite> select file.id from file left join event on file.id = event.file_id where event.new_name = 'aaa.txt' OR (event.new_name IS NULL AND file.name = 'aaa.txt');
QUERY PLAN
|--SCAN TABLE file
`--SEARCH TABLE event USING AUTOMATIC COVERING INDEX (file_id=?)
Run Time: real 1.731 user 1.734375 sys 0.000000

Now this above is what I find surprising. If broken into two queries, both are lightning fast.

sqlite> select file.id from file left join event on file.id = event.file_id where event.new_name = 'aaa.txt';
QUERY PLAN
|--SCAN TABLE event
`--SEARCH TABLE file USING INTEGER PRIMARY KEY (rowid=?)
Run Time: real 0.001 user 0.000000 sys 0.000000
sqlite> select file.id from file left join event on file.id = event.file_id where event.new_name IS NULL AND file.name = 'aaa.txt';
QUERY PLAN
|--SEARCH TABLE file USING COVERING INDEX file_index (name=?)
`--SCAN TABLE event
Run Time: real 0.001 user 0.000000 sys 0.000000

And finally, the two fast queries written as a compound (UNION ALL):

sqlite> select file.id from file left join event on file.id = event.file_id where event.new_name = 'aaa.txt' UNION ALL select file.id from file left join event on file.id = event.file_id WHERE event.new_name IS NULL and file.name = 'aaa.txt';
QUERY PLAN
`--COMPOUND QUERY
   |--LEFT-MOST SUBQUERY
   |  |--SCAN TABLE event
   |  `--SEARCH TABLE file USING INTEGER PRIMARY KEY (rowid=?)
   `--UNION ALL
      |--SEARCH TABLE file USING COVERING INDEX file_index (name=?)
      `--SCAN TABLE event
Run Time: real 0.001 user 0.015625 sys 0.000000

Rewriting my real-life query (which has 3 similar joins instead of 1) to such a UNION, even if possible, will yield an incomprehensible mess and I'd rather avoid that. Is there another way to speed this up?

Thank you for any insight,

Juraj

(2) By Richard Hipp (drh) on 2020-12-21 14:45:48 in reply to 1 [link] [source]

Those are not equivalent queries. Here is an example:

CREATE TABLE t(x INT,c1 INT,c2 INT);
INSERT INTO t VALUES
  (1,0,0),
  (2,1,0),
  (3,0,1),
  (4,1,1);
SELECT x FROM t WHERE c1 OR c2;
SELECT x fROM t WHERE c1
  UNION ALL SELECT x FROM t WHERE c2;

Copy/paste the above in the CLI (or into PostgreSQL if you doubt that SQLite is doing it correctly) and see that the second query provides 4 rows of output whereas the first query provides only 3.

(3.1) By Juraj Ivančić (jivancic) on 2020-12-21 15:50:02 edited from 3.0 in reply to 2 [source]

I completely trust that SQLite is computing it correctly. I apologize for being imprecise - yes I'm aware UNION ALL contains duplicates. To be really equivalent I should have used UNION. The point still stands though, there's practically no difference in duration between UNION & UNION ALL - they complete in a few milliseconds, compared to the problematic query taking seconds.

sqlite> select file.id from file left join event on file.id = event.file_id where event.new_name = 'aaa.txt' UNION select file.id from file left join event on file.id = event.file_id WHERE event.new_name IS NULL and file.name = 'aaa.txt';
QUERY PLAN
`--COMPOUND QUERY
   |--LEFT-MOST SUBQUERY
   |  |--SCAN TABLE event
   |  `--SEARCH TABLE file USING INTEGER PRIMARY KEY (rowid=?)
   `--UNION USING TEMP B-TREE
      |--SEARCH TABLE file USING COVERING INDEX file_index (name=?)
      `--SCAN TABLE event
Run Time: real 0.002 user 0.000000 sys 0.000000

(4) By Richard Hipp (drh) on 2020-12-21 15:45:44 in reply to 3.0 [link] [source]

I'm sorry the original query is taking too long. I don't have a solution to the problem right now.

(5) By David Raymond (dvdraymond) on 2020-12-21 15:46:05 in reply to 3.0 [link] [source]

> To be really equivalent I should have used UNION.

Nope. Even that is not equivalent. See below.
"where a or b" returns 3 rows
"where a union all where b" returns 4
"where a union where b" returns 2


SQLite version 3.34.0 2020-12-01 16:14:00
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.

sqlite> create table foo (a, b);

sqlite> insert into foo values (1, 1), (1, 0), (0, 1);

sqlite> .eqp on

sqlite> select a from foo where a or b;
QUERY PLAN
`--SCAN TABLE foo
a
1
1
0

sqlite> select a from foo where a union all select a from foo where b;
QUERY PLAN
`--COMPOUND QUERY
   |--LEFT-MOST SUBQUERY
   |  `--SCAN TABLE foo
   `--UNION ALL
      `--SCAN TABLE foo
a
1
1
1
0

sqlite> select a from foo where a union select a from foo where b;
QUERY PLAN
`--COMPOUND QUERY
   |--LEFT-MOST SUBQUERY
   |  `--SCAN TABLE foo
   `--UNION USING TEMP B-TREE
      `--SCAN TABLE foo
a
0
1

sqlite>

(6) By Holger J (holgerj) on 2020-12-22 13:29:50 in reply to 2 [link] [source]

PostgreSQL 13 shows the behaviour below. Of course, the data type had to be a proper BOOLEAN, since an INT cannot be checked for truth. And I added an ORDER BY 1 to allow for easier comparison.

CREATE TABLE t(x INT,c1 boolean,c2 boolean);
INSERT INTO t VALUES
  (1,false,false),
  (2,true,false),
  (3,false,true),
  (4,true,true);

SELECT x FROM t 
WHERE c1 OR c2 
ORDER BY 1;
┌───┐
│ x │
├───┤
│ 2 │
│ 3 │
│ 4 │
└───┘


SELECT x fROM t WHERE c1
UNION ALL 
SELECT x FROM t WHERE c2
ORDER BY 1;

┌───┐
│ x │
├───┤
│ 2 │
│ 3 │
│ 4 │
│ 4 │
└───┘


SELECT x fROM t WHERE c1
UNION
SELECT x FROM t WHERE c2
ORDER BY 1;


┌───┐
│ x │
├───┤
│ 2 │
│ 3 │
│ 4 │
└───┘

(7.1) By David Raymond (dvdraymond) on 2020-12-22 13:55:30 edited from 7.0 in reply to 6 [link] [source]

But you made x different for each record. Make it the same and see what happens:

psql (12.0)
Type "help" for help.

testing=> create temp table t (x int, c1 boolean, c2 boolean);
CREATE TABLE
Time: 115.133 ms
testing=> insert into t values (1, false, false), (1, false, true), (1, true, false), (1, true, true);
INSERT 0 4
Time: 0.954 ms
testing=> select x from t where c1 or c2 order by 1;
 x
---
 1
 1
 1
(3 rows)

Time: 59.461 ms
testing=> select x from t where c1 union all select x from t where c2 order by 1;
 x
---
 1
 1
 1
 1
(4 rows)

Time: 0.443 ms
testing=> select x from t where c1 union select x from t where c2 order by 1;
 x
---
 1
(1 row)

Time: 11.495 ms
testing=>