Unexpected Result by Complicated JOINING Again

(1) By Jinsheng Ba (bajinsheng) on 2022-06-13 07:35:25 [source]

The second SELECT should return the same result as the first SELECT, but it doesn't.

CREATE TABLE t0(c0 INTEGER);
INSERT INTO t0 VALUES('x');
CREATE TABLE t1(c0 INTEGER);
INSERT INTO t1 VALUES('y');
CREATE TABLE t2(c0, c1 NOT NULL);
INSERT INTO t2 VALUES('a', 'b');
CREATE TABLE t3(c0 INTEGER);
INSERT INTO t3 VALUES('c');

SELECT * FROM t3 LEFT OUTER JOIN t2 INNER JOIN t0 ON t2.c1 RIGHT OUTER JOIN t1 ON t2.c0;  -- ||||y
SELECT * FROM t3 LEFT OUTER JOIN t2 INNER JOIN t0 ON t2.c1 RIGHT OUTER JOIN t1 ON t2.c0 WHERE (((t2.c1) ISNULL)); -- {} empty result
SELECT (((t2.c1) ISNULL) IS TRUE) FROM t3 LEFT OUTER JOIN t2 INNER JOIN t0 ON t2.c1 RIGHT OUTER JOIN t1 ON t2.c0; -- 1

Environment:

Version: 3.39.0

Commit ID: 342c501f

OS: Ubuntu 20.04

Configuration Options: ./configure --enable-all

Compiler: gcc-9

Client: CLI

(2) By Richard Hipp (drh) on 2022-06-13 12:09:16 in reply to 1 [link] [source]

Thanks for the bug report.

Here is a description of the underlying cause of the problem, for those who are interested: SQLite has an optimization that tries to convert expressions of the form "x ISNULL" into "FALSE" if the column "x" has a NOT NULL constraint. That optimization has to be disabled if the column "x" is on the nullable size of an outer join and is not inside the ON clause. All of this works. But in this case, SQLite is also using another optimization that converts the LEFT JOIN into an INNER JOIN. And that second optimization makes the "t2.c1 ISNULL" constraint think that it is no longer on the nullable side of an outer join, even though it is still on the nullable side of a different RIGHT JOIN. Hence, the optimization of converting "t2.c1 ISNULL" into "FALSE" is incorrectly applied.

We can easily make SQLite get the correct answer here by just disabling the optimization that converts "t2.c1 ISNULL" into "FALSE". But that will cause performance regressions. We are working now to figure out how to disable the ISNULL-to-FALSE optimization only when it is truly necessary.

(3) By Richard Hipp (drh) on 2022-06-13 12:43:41 in reply to 1 [link] [source]

Problem now fixed on trunk - check-in b1be2259e2e08ec2

(4) By doug (doug9forester) on 2022-06-13 14:37:13 in reply to 3 [link] [source]

Richard, I notice designing and coding RIGHT JOIN is a nightmare for you. There seem to be endless paths where it can break queries. Watching from afar, I get the feeling that one can never be sure that RIGHT JOIN is entirely correctly implemented.

Perhaps, there is a smarter way to proceed: you use path analysis tools in testing to tell you whether every path is executed. Is there a way to "tag" paths related to RIGHT JOIN code & data such that the results of running your total test suite highlights paths which were affected/not affected by RIGHT JOIN code and data? And then use that information to generate new tests and/or examine the not-affected paths?

Just thinking outside the box a bit ...

(5) By jose isaias cabrera (jicman) on 2022-06-13 15:20:58 in reply to 4 [link] [source]

Hi Doug.

Will you explain why do you think it's a nightmare? We've just seen a few fixes of very complicated SQL RIGHT JOINs. For a very new addition to SQLite, IMHO, it has very little problems. If we had seen 30+ reports, than I would have agreed with you.

josé

(6) By doug (doug9forester) on 2022-06-13 22:18:50 in reply to 5 [link] [source]

Apologies for the characterization of nightmare. From over here, it looks like a never-ending stream of "oops - missed that path". I thought if there were a way to increase the reliability of SQLite, that you might think about it.

(8) By Harald Hanche-Olsen (hanche) on 2022-06-14 07:35:40 in reply to 6 [link] [source]

I admit I was having similar thoughts at first. But then, I remembered that Richard has been at this for a very long time, far longer than I have ever pursued any single project, so he probably has a pretty good idea what he is doing. More to the point, I believe every line of code in SQLite is exercised in some test case or other. But truly comprehensive tests would have to exercise every code path in every conceivable circumstance, and that is just impossible to achieve.

I can only guess, but I imagine that what is happening here is that the knowledge that the only outer joins are LEFT joins has been used as a simplifying assumption many places in the code, and that the current task is to identify those places, remove the assumption, and adjust the code accordingly.

(9.1) By Ryan Smith (cuz) on 2022-06-14 09:20:01 edited from 9.0 in reply to 6 [link] [source]

In addition to other replies, I think the thing not to lose focus of is that SQLIte is just fine, this is a very new bit of development that is being shared/published as-they-go due to the nature of the SQLite project, and people are reporting issues in very new code that probably doesn't even have test-cases yet, which by the way is very helpful I think, but should not be taken to reflect on the code-stability or code-goodness of production code of SQLite at all.

(11.1) By jose isaias cabrera (jicman) on 2022-06-15 16:10:12 edited from 11.0 in reply to 6 [link] [source]

Apologies for the characterization of nightmare.

No apologies needed. :-) I been using SQLite since 2006, and I am a witness of how well design it is, and it has the best support of any software out there (IMHO). And to make it even better, It's free!

(10) By Larry Brasfield (larrybr) on 2022-06-14 15:32:47 in reply to 4 [link] [source]

Perhaps, there is a smarter way to proceed: you use path analysis tools in testing to tell you whether every path is executed.

Perhaps I'm missing something here, but I am unable to see path analysis by whatever means as something that would lead to fewer problems for the new RIGHT/OUTER JOIN feature. The problems have generally been unforeseen interactions between the new joins and optimizations already in place for the older joins. The issues have not been "whether every path is executed". Rather, they have been "whether all of the old paths remain appropriate in the face of new inputs" and "whether new paths are needed to preserve preconditions upon which old join optimizations were predicated".

There may be some development refinements that would help, but coverage testing or analysis is something that does the most good once the code to implement various features is stable.

(7) By Jinsheng Ba (bajinsheng) on 2022-06-14 07:28:33 in reply to 3 [link] [source]

Thanks for you detailed explanation! It is very useful for me to understand the bug.