SQLite: Check-in [2cbbabdf5e]

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview

Comment:	When the query planner has the opportunity to use an IN operater constraint on a term of an index other than the left-most term, use the estimated number of elements on the right-hand side of the IN operator to determine if makes sense to use the IN operator with index lookups, or to just do a scan over the range of the table identified by the index terms to the left. Only do this if sqlite_stat1 measurements are available as otherwise the performance estimates will not be accurate enough to discern the best plan. Bias the decision slightly in favor of using index lookups on each element of the IN operator.
Downloads:	Tarball \| ZIP archive
Timelines:	family \| ancestors \| descendants \| both \| trunk
Files:	files \| file ages \| folders
SHA3-256:	2cbbabdf5ef624d809fbb40d2d312a29e0b5f02756fc0dbf6985fc8b0c8d1ade
User & Date:	drh 2018-06-08 23:23:53.721
Original Comment:	When the query planner has the opportunity to use an IN operater constraint on a term of an index other than the left-most term, use the estimated number of elements on the right-hand side of the IN operator to determine if makes sense to use the IN operator with index looks, or to just do a scan over the range of the table identified by the index terms to the left. Only do this if sqlite_stat1 measurements are available as otherwise the performance estimates will not be accurate enough to discern the best plan. Bias the decision slightly in favor of using index lookups on each element of the IN operator.

About

This optimization falls down if the table statistics are highly non-linear. Consider the following scenario:

  CREATE TABLE t1(a,b,c);
  WITH RECURSIVE 
     c1(x) as (VALUES(1) UNION ALL SELECT x+1 FROM c1 WHERE x<5000),
     c2(y) as (VALUES(0) UNION ALL SELECT y+1 FROM c2 WHERE y<10)
  INSERT INTO t1(a,b,c) SELECT x*1000, 2*(y/4), x*1000+y FROM c1, c2;
  WITH RECURSIVE 
     c1(x) as (VALUES(1) UNION ALL SELECT x+1 FROM c1 WHERE x<50000)
  INSERT INTO t1(a,b,c) SELECT 4000, 2*x, 99 FROM c1 WHERE true;
  CREATE INDEX t1ab ON t1(a,b);
  ANALYZE;

The resulting table has 105000 entries total. For any given "a" value, there are on average just 21 repeats. But most of those are for the value of "a=4000". If we exclude "a==4000", then all the other "a" values have only 11 repeats each. There are 50011 repeats of "a==4000". The query planner sees only the average of 21, however.

Consider these two queries:

  SELECT * FROM t1 WHERE a=4000 AND b IN (1,3,5,7,9,11,13);
  SELECT * FROM t1 WHERE a=4000 AND b IN (1,3,5,7,9,11,13,15);

The first uses both columns of the index and hence does 7 separate index look-ups to obtain the answer. The second case tries to use the optimization of this check-in. It attempts to move the cursor to the first "a==4000" entry and then do a scan looking for rows with a matching "b" value. The planner thinks that there will be only 21 rows and hence a scan will be faster than the 8 binary searches. But for this one particular value of "a", there are actually 50011 rows to scan, and so the scan is significantly slower than doing 8 searches.

Context

2018-06-09
00:09		Avoid invoking the whereLoopAddOr() routine in the query planner if there are no OR operators in the WHERE clause, thus speeding up query planning slightly. (check-in: 292724ffc4 user: drh tags: trunk)
2018-06-08
23:23		When the query planner has the opportunity to use an IN operater constraint on a term of an index other than the left-most term, use the estimated number of elements on the right-hand side of the IN operator to determine if makes sense to use the IN operator with index lookups, or to just do a scan over the range of the table identified by the index terms to the left. Only do this if sqlite_stat1 measurements are available as otherwise the performance estimates will not be accurate enough to discern the best plan. Bias the decision slightly in favor of using index lookups on each element of the IN operator. (check-in: 2cbbabdf5e user: drh tags: trunk)
21:21		Only choose to scan an IN operator rather than use an index if we have real STAT1 data to suggest it is advantageous. (Closed-Leaf check-in: 30e874661d user: drh tags: in-scan-vs-index)
19:13		Fix an assert() that can be false for a corrupt database and a strange query that uses a recursive SQL function to delete content from a corrupt database file while it is being queried. (check-in: 99057383ac user: drh tags: trunk)

Changes

Changes to src/where.c.

Changes to test/in6.test.

Changes to test/rowvalue4.test.