Query Planner and propagation of primary key through view

(1) By anonymous on 2020-07-09 08:21:26 [link] [source]

While investigating a slow query, I noticed that the use of a view made the query planner generate an unexpected plan. Consider the following tables and view:

CREATE TABLE table_one (table_one_pk INTEGER PRIMARY KEY);

CREATE TABLE table_two (
  table_two_pk INTEGER PRIMARY KEY,
  table_one_fk INTEGER,
  value INTEGER,
  flag INTEGER,
  FOREIGN KEY (table_one_fk) REFERENCES table_one(table_one_pk));

CREATE VIEW table_two_filter AS
SELECT table_two_pk, table_one_fk, value
FROM table_two
WHERE flag = 15;

'table_two' is typically an order of magnitude bigger than 'table_one'. The view exists because nearly all use cases need to filter on a fixed value of the flag.

When running the following query, the query plan lists an unexpected full scan of 'table_two':

SELECT table_one.table_one_pk, MAX(table_two_filter.value)
FROM table_one
LEFT OUTER JOIN table_two_filter ON table_two_filter.table_one_fk = table_one.table_one_pk
GROUP BY table_one.table_one_pk;

QUERY PLAN
|--MATERIALIZE 2
|  `--SCAN TABLE table_two
|--SCAN TABLE table_one
`--SEARCH SUBQUERY 2 USING AUTOMATIC COVERING INDEX (table_one_fk=?)

The equivalent query, with the flag filtering listed with the join constraints, generates the plan I would expect:

SELECT table_one.table_one_pk, MAX(table_two.value)
FROM table_one
LEFT OUTER JOIN table_two ON table_two.table_one_fk = table_one.table_one_pk
  AND table_two.flag = 15
GROUP BY table_one.table_one_pk;

QUERY PLAN
|--SCAN TABLE table_one
`--SEARCH TABLE table_two USING AUTOMATIC COVERING INDEX (table_one_fk=? AND flag=?)

Here's a bit of sample data to play with:

INSERT INTO table_one VALUES (1);
INSERT INTO table_one VALUES (2);
INSERT INTO table_two VALUES (101, 1, 1001, 15);
INSERT INTO table_two VALUES (102, 1, 1002, 15);
INSERT INTO table_two VALUES (103, 1, 1003, 14);

(2) By Dan Kennedy (dan) on 2020-07-09 14:38:03 in reply to 1 [link] [source]

It's not obvious by any means, but in practice the first plan is actually better than the second. Perhaps this is an optimization opportunity.

In both cases, SQLite builds a temporary index on table_two. In the first plan, where the view is materialized first, this index ends up equivalent to:

  CREATE INDEX tmp ON table_two(table_one_fk, flag, value) WHERE flag=15;

and with the second plan:

  CREATE INDEX tmp ON table_two(table_one_fk, flag, value);

in both cases SQLite has to do a linear scan of table_two to build the temporary index. But in the first case there are fewer rows to insert into the index, so that plan is most likely better.

Also interesting is that in the first case, SQLite drops the rows into a temporary table first ("materializing" the view) before reading them back out of that table to create the temporary index.

(3) By anonymous on 2020-07-09 15:47:40 in reply to 2 [link] [source]

In reducing the problem to a minimal test case I seem to have forgotten a piece of the puzzle. There was actually an index on the foreign key:

CREATE INDEX index_table_two_table_one_fk ON table_two(table_one_fk);

This removes the need for the temporary index in the second query, which becomes:

QUERY PLAN
|--SCAN TABLE table_one
`--SEARCH TABLE table_two USING INDEX index_table_two_table_one_fk (table_one_fk=?)

This looks optimal. It doesn't change the query plan of the first query however, which I find surprising.

Only when I add the flag filtering to the index (thanks Dan, I hadn't realized this was an option), does the plan change, but it still materialises the view:

CREATE INDEX index_table_two_table_one_fk ON table_two(table_one_fk) WHERE flag = 15;

QUERY PLAN
|--MATERIALIZE 2
|  `--SCAN TABLE table_two USING INDEX index_table_two_table_one_fk
|--SCAN TABLE table_one
`--SEARCH SUBQUERY 2 USING AUTOMATIC COVERING INDEX (table_one_fk=?)

Is there a way to avoid this?

The purpose of the view is to avoid user error, but its use is harder to recommend when it can cause unexpected behaviour.

(4) By Dan Kennedy (dan) on 2020-07-09 18:09:06 in reply to 3 [source]

Is there a way to avoid this?

It's the LEFT JOIN that is stopping SQLite from transforming the query to the second form internally. Is there a way to rewrite the query that doesn't use a LEFT JOIN?

(5) By Keith Medcalf (kmedcalf) on 2020-07-09 18:50:48 in reply to 1 [link] [source]

You are correct. The optimizer does not push the conditions attached to a table-subquery on the RHS of a left-join into the outer join conditions bound to the target table. In this case it probably could. I do not know if it should do that optimization in all cases or not, nor even if it will be beneficial in most cases.

Is there a particular reason you are using an outer join or is it just because?