SQLite Forum

Why do I need to LEFT JOIN pragma_table_info() ?

Why do I need to LEFT JOIN pragma_table_info() ?

(1) By Screwtape on 2024-03-24 03:40:10 [source]

I'm using the sqlite3 CLI from SQLite 3.45.1, as packaged by Debian Linux.

I've been playing with the pragma table-valued functions recently, and came up with the following query:

FROM pragma_table_list() AS t
    JOIN pragma_foreign_key_list(t.name, t.schema) AS f
    LEFT JOIN pragma_table_info(f."table", t.schema) AS i;

This query returns information about all the foreign key references in an SQLite database (well, it needs more filtering to be useful, but that's not part of my question). However, that "LEFT JOIN" there seems weird to me: there's no ON or USING clause, and since I'm calling pragma_table_info() with the name of a table that exists (because it came from pragma_table_list()) it should always return rows. And sure enough, in the databases I've tested with, when I run this query, the table_info fields are never NULL. So if I remove LEFT, I should get exactly the same query results:

FROM pragma_table_list() AS t
    JOIN pragma_foreign_key_list(t.name, t.schema) AS f
    JOIN pragma_table_info(f."table", t.schema) AS i;

...except that this query always returns zero rows, no matter what.

Some experimentation shows that "LEFT JOIN", "RIGHT JOIN", "FULL JOIN" and "CROSS JOIN" all return the results I expect, while "JOIN", "INNER JOIN", and the comma operator all return no results. This is extra confusing, because the SELECT documentation says 'The "CROSS JOIN" join operator produces the same result as the "INNER JOIN", "JOIN" and "," operators' and here it very much does not.

If this were an issue with table-valued functions in general, I would expect this to be an issue joining pragma_table_list() and pragma_foreign_key_list() as well, not just pragma_table_info().

I have tried to search the forum for previous discussions, and found the threads pragma_table_info - JOIN/FROM order - different behaviour. (CROSS JOIN), Left outter join on pragma tables acts as inner join, and RIGHT JOIN to "table valued function" broken from 3.40 but none of them seem to be exactly what I'm experiencing.

Is this a bug? If so, is "LEFT JOIN" a reasonable workaround, that won't cause problems when the bug is fixed? If it's not a bug, what am I missing?

(2.1) By Richard Hipp (drh) on 2024-03-24 22:13:08 edited from 2.0 in reply to 1 [link] [source]

Thanks for the report.

This problem was caused by an inaccuracy in the implementation of the xBestIndex method of the PRAGMA table-valued functions. The problem has now been fixed on trunk, on branch-3.45, and on Fiddle. See check-in bc516ff5202ee6e9 for the patch.

This problem goes all the way back to the introduction of the pragma table-valued functions in SQLite version 3.16.0, 2017-01-02.

(3) By Screwtape on 2024-03-25 02:49:28 in reply to 2.1 [link] [source]

Thank you very much!

(4) By ddevienne on 2024-03-25 08:15:17 in reply to 2.1 [link] [source]

Hi. As a v-table writer, I'm worried that I don't understand the fix...

The only changes are to estimatedRow/Cost, and while I'd expect those to affect the quality of the plan chosen,
possibly resulting in suboptimal performance, I would never expect these estimations, all other things being equal,
to affect the correctness of the result.

Could you please expand on the fix and why it affected correctness?
Any pointers in the vtable doc that could explain the faulty behavior?

(5) By Richard Hipp (drh) on 2024-03-25 11:10:59 in reply to 4 [link] [source]

Simplified test case:

SELECT * FROM t2 JOIN t3 JOIN pragma_table_info(t3.n,t2.s);

The first hidden column of the pragma_table_info virtual table is "tablename" and the second is the "schema" for the underlying PRAGMA:

PRAGMA schema.table_info(tablename)

Both hidden columns are an input only. The virtual table cannot discover the value of the tablename nor the schema if you don't tell it using a constraint or an argument to the table-valued function.

In a three-way join on table t1, t2, pragma_table_info, the query planner will invoke xBestIndex multiple times:

  1. Both tablename and schema are unknown.
  2. tablename is known but schema is not.
  3. schema is known but tablename is not.
  4. Both tablename and schema are known.

If 2 and 4 have the same cost, then the query planner might pick 2. In that case, the query planner will code an extra conditional "pragma_table_info.schema==t2.s" prior to returning each row. But pragma_table_info.schema always returns NULL, since schema is input-only. Thus no rows are returned.

To prevent this, we give a higher cost to xBestIndex call #2, so that call #4 is always chosen.

(6) By ddevienne on 2024-03-25 11:28:11 in reply to 5 [link] [source]

Thank you Richard.

I was missing the interaction of hidden and input-only columns.
It makes sense now. I appreciate you taking the time to explain it further.

(7) By Richard Hipp (drh) on 2024-03-25 11:37:22 in reply to 5 [link] [source]

Using SQLITE_CONSTRAINT as the return value from xBestIndex instead of setting goofy cost values is perhaps a better solution to this problem. Such a change is now checked in.