SQLite Forum

Leaf table pruning in Queries (Enhancement request for comment)
Login

Leaf table pruning in Queries (Enhancement request for comment)

(1) By Keith Medcalf (kmedcalf) on 2020-07-26 17:45:37 [link] [source]

Presently, the SQLite3 optimizer "prunes" tables from queries where that table is a leaf that does not have columns that are used elsewhere in the query, but only for outer (LEFT) joins, but not for inner joins.

Is it possible to extend this optimization to also consider inner joined tables as well as outer?

Example when using the following definitions:

create table m
(
  a integer not null references a,
  b integer not null references b,
  c integer not null references c
);
create table a
(
  id integer primary key,
  va text
);
create table b
(
  id integer primary key,
  vb text
);
create table c
(
  id integer primary key,
  vc text
);
create view v1
as
    select *
      from m
 left join a on a.id = m.a
 left join b on b.id = m.b
 left join c on c.id = m.c;
create view v2
as
    select *
      from m
      join a on a.id = m.a
      join b on b.id = m.b
      join c on c.id = m.c;
select va from v1 where vc = 'something';
select va from v2 where vc = 'something';

We get the following query plans:

sqlite> select va from v1 where vc = 'something';
QUERY PLAN
|--SCAN TABLE m (~1048576 rows)
|--SEARCH TABLE a USING INTEGER PRIMARY KEY (rowid=?) (~1 row)
`--SEARCH TABLE c USING INTEGER PRIMARY KEY (rowid=?) (~1 row)
sqlite> select va from v2 where vc = 'something';
QUERY PLAN
|--SCAN TABLE m (~1048576 rows)
|--SEARCH TABLE a USING INTEGER PRIMARY KEY (rowid=?) (~1 row)
|--SEARCH TABLE c USING INTEGER PRIMARY KEY (rowid=?) (~1 row)
`--SEARCH TABLE b USING INTEGER PRIMARY KEY (rowid=?) (~1 row)

Note that table b is pruned from the query plan when it is an outer join, but not when it is an inner join, despite contributing nothing to the result.

I do note that in order to be able to know that table b can be pruned it is necessary to be able to prove that the lookup from m -> b cannot affect the result which is guaranteed by the not null and references constraints, but I do not know if this information is available to the QP in a usable format or how difficult such a prover would be to construct.

(2) By Richard Hipp (drh) on 2020-07-26 18:05:00 in reply to 1 [source]

table b is pruned from the query plan when it is an outer join, but not when it is an inner join, despite contributing nothing to the result

Table b does contribute to the result when it is an inner join by restricting output rows to those for which the m.b value has a corresponding entry in b.id. It is as if there was an additional WHERE clause term:

 ... AND m.b IN (SELECT id FROM b)

(3) By Keith Medcalf (kmedcalf) on 2020-07-26 18:22:13 in reply to 2 [link] [source]

Except, of course, that m.b is always constrained such that

... AND m.b IN (SELECT id FROM b)

is always True because m.b is constrained NOT NULL and also constrained to only contain values in (SELECT id FROM b) by the foreign key constraint. Assuming, of course, that the database integrity is not "busted".

(4) By Keith Medcalf (kmedcalf) on 2020-07-26 19:23:07 in reply to 2 [link] [source]

The real utility is however in allowing the elimination of tables that provide nothing of value, though those are probably more difficult to identify since there is no direct way to phrase a query which would obtain the same result. For example:

create table a(id integer primary key);
create table b(va integer not null references a(id), vb);
create table c(va integer not null references a(id), vc);

select vc from a, b, c where b.va = a.id and c.va = a.id and vb = 4;

could eliminate the common leaf table a by treating the query as:

select vc from b, c where b.va = c.va and vb = 4;

since you know that table a provides no utility as long as you also can substitute the transitive equality b.va = c.va. This would save the extraneous descent though table a. I don't know if the QP knows that b.va == c.va or not and thus could eliminate table a entirely (ie, could the current planner choose the loop order b -> c -> a such that a becomes a leaf which could be eliminated).