Incorrect result might caused by INDEX

(1) By ChiZhang on 2023-03-04 02:08:42 [source]

Consider the following test case:

CREATE TABLE t0 (c0);
INSERT INTO t0(c0) VALUES (1);
CREATE INDEX i0 ON t0(c0 > 0);
CREATE VIEW v0(c0) AS SELECT AVG(t0.c0) FROM t0 GROUP BY 1>t0.c0;

SELECT COUNT(*) FROM t0 INDEXED BY i0 WHERE (SELECT COUNT(*) FROM v0 WHERE v0.c0 BETWEEN 0 AND 0); -- 1

This query produces 1, which I believe, is unexpected. Since v0.c0 has one row which equals to 1.0 and v0.c0 BETWEEN 0 AND 0 will return FALSE, so this subquery will return 0 and I expect the result of outer query to be 0. Note that this reproduces also on the latest trunk version (5a091911).

(2) By Richard Hipp (drh) on 2023-03-04 15:48:55 in reply to 1 [link] [source]

Now fixed on trunk.

Analysis

This problem was introduced by a new query planner optimization added to version 3.41.0 and identified as optimization (1a) on the change log. Preconditions to encounter the problem:

The query must contain an aggregate subquery.
The aggregate subquery must have a GROUP BY clause.
The GROUP BY clause must reference terms that are not included in the result set of the query.
The query planner must choose to implement the GROUP BY clause by doing a sort operation.
The outer query that contains the aggregate subquery must make use of indexed expressions.

If all of the above conditions are met, then a single variable in SQL parser data structure (the "abstract syntax tree" or "AST") that describes the aggregate subquery gets set to an incorrect value, resulting in incorrect byte-code and an incorrect answer.