SQLite Forum

pragma_table_info - JOIN/FROM order - different behaviour.
Login

pragma_table_info - JOIN/FROM order - different behaviour. (CROSS JOIN)

(1.1) By midijohnny on 2022-04-07 22:00:01 edited from 1.0 [source]

Just reporting in case it is a bug; or perhaps just a gotcha with regard to how 'pragma' tables work.

Given the following:

CREATE TABLE abc(x,y,z);
CREATE TABLE greeting(hello,goodbye);
CREATE TABLE tab(t);
INSERT INTO tab VALUES('abc');
INSERT INTO tab VALUES('greeting');

When I join like this:

select name from tab CROSS JOIN pragma_table_info(t);

I get the intended output:

name
x
y
z
hello
goodbye

But If I switch the order to this:

select name from pragma_table_info(t) CROSS JOIN tab;

I get no results, the 't' column is apparently seen, because I get an error if I swap it like this:

sqlite> select name from pragma_table_info(badcol) CROSS JOIN tab;
Parse error: no such column: badcol
  select name from pragma_table_info(badcol) CROSS JOIN tab;
                       error here ---^
sqlite> .version
SQLite 3.39.0 2022-03-10 23:37:58 531e6ad0389c6c820bb8c64db5049fb6b0bffd30bd394fd8ee7412959b1752e2
zlib version 1.2.11
gcc-9.4.0

(2.2) By midijohnny on 2022-04-07 21:40:12 edited from 2.1 in reply to 1.0 [link] [source]

Looks like the keyword 'CROSS' is the culprit somehow. If I use a plain 'JOIN' keyword I get the results:

sqlite> select name from pragma_table_info(t) JOIN tab;
|  name   |
|---------|
| x       |
| y       |
| z       |
| hello   |
| goodbye |

And I just found this article about how 'CROSS JOIN' can affect sqlite3's handling of results - I assume this is somehow related.

EXPLAIN PLANS.

These don't mean too much to me: but I can see they are different:

Just JOIN:

sqlite> explain select name from pragma_table_info(t) JOIN tab;
addr  opcode         p1    p2    p3    p4             p5  comment      
----  -------------  ----  ----  ----  -------------  --  -------------
0     Init           0     13    0                    0   Start at 13
1     OpenRead       1     4     0     1              0   root=4 iDb=0; tab
2     VOpen          0     0     0     vtab:55C2578C5020  0   
3     Rewind         1     12    0                    0   
4       Column         1     0     3                    0   r[3]=tab.t
5       Integer        0     1     0                    0   r[1]=0
6       Integer        1     2     0                    0   r[2]=1
7       VFilter        0     11    1                    0   iplan=r[1] zplan=''
8         VColumn        0     1     4                    0   r[4]=vcolumn(1); pragma_table_info.name
9         ResultRow      4     1     0                    0   output=r[4]
10      VNext          0     8     0                    0   
11    Next           1     4     0                    1   
12    Halt           0     0     0                    0   
13    Transaction    0     0     3     0              1   usesStmtJournal=0
14    Goto           0     1     0                    0   

CROSS JOIN

sqlite> explain select name from pragma_table_info(t) CROSS JOIN tab;
addr  opcode         p1    p2    p3    p4             p5  comment      
----  -------------  ----  ----  ----  -------------  --  -------------
0     Init           0     15    0                    0   Start at 15
1     VOpen          0     0     0     vtab:55C2578C5020  0   
2     OpenRead       1     4     0     1              0   root=4 iDb=0; tab
3     Integer        0     1     0                    0   r[1]=0
4     Integer        0     2     0                    0   r[2]=0
5     VFilter        0     14    1                    0   iplan=r[1] zplan=''
6       Rewind         1     14    0                    0   
7         VColumn        0     6     3                    0   r[3]=vcolumn(6); pragma_table_info.arg
8         Column         1     0     4                    0   r[4]=tab.t
9         Ne             4     12    3     BINARY-8       83  if r[3]!=r[4] goto 12
10        VColumn        0     1     5                    0   r[5]=vcolumn(1); pragma_table_info.name
11        ResultRow      5     1     0                    0   output=r[5]
12      Next           1     7     0                    1   
13    VNext          0     6     0                    0   
14    Halt           0     0     0                    0   
15    Transaction    0     0     3     0              1   usesStmtJournal=0
16    Goto           0     1     0                    0   

(3) By Keith Medcalf (kmedcalf) on 2022-04-07 23:58:48 in reply to 2.2 [link] [source]

Correct. The CROSS keyword specifies that the LHS table must be in an outer loop with respect to the RHS table. Specifically tA CROSS JOIN tB means that tA must be in an outer loop with respect to tB. tB CROSS JOIN tA means that tB must be in the outer loop with respect to tA.

Leaving out the word CROSS means that the Query Planner is free to place either table in the outer loop with respect to the other -- that is to say that the keyword CROSS limits the freedom of the Query Planner to generate a query plan and forces the relative ordering of traversal of the tables.

That means that when you use the keyword CROSS in tab CROSS JOIN prag(tab.t) that you have specified that you want tab in the outer loop with respect to prag.

When you specify prag(tab.t) CROSS JOIN tab you are requiring that the prag(tab.t) be in an outer loop with respect to t. This will not generate the results you want (it will, however, do EXACTLY AND PRECISELY what you requested, even if that does not produce the result you intend).

If you leave out the CROSS keyword then the query planner will see that tab must be in an outer loop with respect to the prag table so that the parameter t can be sent from the table tab into the eponymous virtual table.

About the only thing that may be an issue is that mayhaps the query planner should detect that you have required an impossilbe nesting order and return you an error (similar to the one that you get if you specify an INDEXED BY but the index cannot be used) so that you know you are asking for something impossible. The TRS-DOS errormessage How? might be appropriate :)

(4) By midijohnny on 2022-04-08 08:26:06 in reply to 3 [link] [source]

Thanks for this: I guess this only applies to table valued functions? If so, I guess in an ideal world the Query Planner should bail with an error if presented with a combination of CROSS JOIN + Table-Valued-Function-Not-In-Last-Position: just to stop the human making mistakes.

I had used 'CROSS JOIN' in the spirit of 'code-as-comment': with the intention to indicate that I really did mean a Cartesian-Product - as opposed to forgetting an 'ON' clause for an INNER JOIN. (Which I would try and do if dealing with standard tables/views).

I had no idea about the implications to the Query Planner.

But now that I think about it : considering Table Valued Functions have a parameter to think about; they aren't identical to tables in that sense ; so maybe this isn't really a true Cartesian-join. (Since the param is acting like a filter/join criteria). So maybe its just 'buyer-beware'; if you are using Table-Valued-Functions - remember they have specific characteristics that you need to take account of.

(5) By Keith Medcalf (kmedcalf) on 2022-04-08 14:58:28 in reply to 4 [link] [source]

It would apply to tables as well. The virtual table pragma_table_info quacks like a function, but is a table. The arguments are called arg and schema. arg is the tablename string, and schema is the schema name string.

So, your queries were really:

select *
  from tab
  join pragma_table_info
    on arg == t;

Clearly this must be traversed tab -> pragma_table_info and attempting to go in the other direction will not work. (an External observer who knows that pragma_table_info is a table-valued-function that requires parameters to generate the correct result will know this, but to the query planner, it is just a table).

(7) By midijohnny on 2022-04-08 20:35:26 in reply to 5 [link] [source]

So really: I wasn't doing a CROSS JOIN at all - it was an INNER JOIN as it had an implicit 'ON' clause.

I'll have to think about why (in the case of ordinary tables) the SQL you provided above wouldn't work either.

(6) By Bjoern Hoehrmann (bjoern) on 2022-04-08 17:40:21 in reply to 3 [link] [source]

Some months ago drh added documentation on how virtual tables should handle missing required parameters. It is likely that older built-in virtual tables do not follow those recommendations. If xBestIndex returned SQLITE_CONSTRAINT here, I believe you would get an error message that no solution can be found for the given parameters. The json1 virtual tables also do not return SQLITE_CONSTRAINT, so SELECT * FROM JSON_EACH is empty rather than an error.

(8) By midijohnny on 2022-04-08 20:38:49 in reply to 6 [link] [source]

Thanks for this additional insight. Do you happen to know of any built-in virtual tables which do follow the guidelines? (So I can experiment for my own edification).

(Maybe 'generate_series'?)

(9) By Bjoern Hoehrmann (bjoern) on 2022-04-08 22:38:37 in reply to 8 [link] [source]

Yes, according to https://sqlite.org/forum/forumpost/c83977a7e6b05431?t=h