Discovered bug in xBestIndex of virtual tables
I've discovered that in some circumstances, xBestIndex won't report all constraints on a hidden column. Specifically, it happens when that column is constrained to another hidden column, but only when the query does not involve
JOINs. Let's look at a series of examples. For all examples, I'll be using the following virtual table declaration:
CREATE TABLE vtab(foo HIDDEN, bar HIDDEN, x, y);
First, let's look at some examples that behave correctly.
In this example, xBestIndex reports two constraints:
"bar" (in that order). This is the correct behavior.
SELECT vtab.* FROM vtab, other_table WHERE foo = ? AND bar = ?;
In this example, xBestIndex reports three constraints:
"foo" (in that order). Now, I'm not sure if that really makes sense, considering there's only two constraints in the
WHERE clause, but I suppose you could argue that
foo = bar can be reversed to imply a third constraint. I'll just give the benefit of the doubt and say this isn't a bug.
SELECT vtab.* FROM vtab, other_table WHERE foo = ? AND bar = foo;
This example is similar to #1, except we removed
other_table. This one still behaves correctly. xBestIndex reports two constraints:
"bar" (in that order).
SELECT * FROM vtab WHERE foo = ? AND bar = ?;
This example is similar to #2, except we removed
other_table. This is the case that causes the bug. You would except xBestIndex to report at least two constraints (or maybe three, like in example #2), but instead, only one constraint is reported, which is on
"foo". xBestIndex does not report any constraints on
SELECT * FROM vtab WHERE foo = ? AND bar = foo;
It would be great if case #4 could be fixed. Thank you!
It it is a bit strange.
In example 4, "bar = foo" cannot be used by the virtual table, and so is not passed to xBestIndex. For a virtual table to use a constraint, one side of the operation must be a virtual table column and the other side some value that is available before the virtual table scan is begun. In other words, there must be a value available to pass to xFilter. If both sides of the constraint are columns of the virtual table, neither are available before the scan is started and so the constraint cannot be used by the virtual table.
It's curious that example 2 gives you 3 constraints. Two would be correct I think - the explicit "foo = ?" and the implied "bar = ?" (if "bar=foo" and "foo=?", it must be the case that "bar=?").
I suspect the 3rd constraint is the code that finds the "bar=?" constraint being too aggressive - reasoning that since bar=? and bar=foo, there is a second foo=?.
Not sure why you don't get this transitive constraint in example 4. Quite possibly you should be...
Exactly, that's why I included example #2. There's clearly inconsistent behavior here.
(4.1) By Gunter Hick (gunter_hick) on 2021-05-14 06:03:38 edited from 4.0 in reply to 2.1 [link] [source]
In example #2 there is a JOIN and a constraint that resembles a JOIN constraint, so SQLite generates a constraint entry for both sides of foo == bar without further checks. In example #4 there is no join, so SQLite correctly concludes that checking two fields of the same row for equality does not constitute a checkable constraint. EDIT: Example #2 should have only 1 constraint, just like #4.
That's not true. The constraint in example #2 is not related to the JOIN at all. Both
bar are columns of the same table.
In fact, if you remove
foo = ?, then example #2 generates zero constraints. So it's clearly doing some transitive logic here. It knows that
bar = ? based on
foo = ? AND bar = foo. I think that's very clever and desirable behavior, but we don't get the same behavior in example #4 unfortunately.
I can confirm that there are some unexpected solutions on sqlite side
So I emulated these testes and for example #2 (with Join) xBestIndex indeed offers two suggestions, one of them containing 3 constraints, but the third constraint in this case is exactly the same as the first one ("foo" field). The second xBestIndex call offers three columns, but the third column has usable field set to 0, so it's effectively a 2-column option. And if I make the 3-column option cheap and 2-column undesirable (returning SQLITE_CONSTRAINT), I get "no query solution". But if 2-column option is ok on my side (SQLITE_OK with some cost), sqlite goes with it (and xFilter offers two values). So the 3-column index suggestion is probably some internal non-usable variant.
This 2-column result looks exactly like sqlite somehow resolves "bar" value transitively (I see real values in xFilter and real results when bind some actual values). And for example #4 when there's no join I also see only one xBestIndex suggestion with a single column constraint and no solution if I expect both values in xFilter.
So it looks like sqlite is able somehow transitively resolve values for xFilter, but probably this mechanics is not ideal at the moment. So in my opinion it's not a bug, but a case for improvement
Please try again using check-in cf63abbe559d04f9 or later. Report back your findings, please.
I just tested all four examples with the new check-in, and everything seems to be working as expected now. All four examples now report two constraints:
bar (in that order).
I did notice that if you change
bar = foo to
foo = bar (shown below), you get three constraints:
bar (in that order). But I think that's completely fine.
SELECT * FROM vtab WHERE foo = ? AND foo = bar
Thanks for fixing this!