Table aliases on parenthesized results of JOINS and set operations (UNION etc.) -- feature or chance?

(1) By anonymous on 2020-10-04 17:26:43 [source]

My understanding from all literature at my disposal was that table aliases 
can be bestowed upon either literal table names or on subqueries.

I have now noticed that if you parenthesise a join, 
as in the following example:

------------------------------------------------------

CREATE TABLE foo (id, wkd, a, b, c);
INSERT INTO foo VALUES ('Yes',   'Mon', 1, 2, 3),
                       ('No',    'Tue', 2, 3, 5),
                       ('Maybe', 'Wed', 3, 4, 7);

CREATE TABLE bar (id, a, b, c);
INSERT INTO bar VALUES ('Yes',   1, 2, 4),
                       ('No',    2, 3, 5),
                       ('Maybe', 3, 6, 7);


UPDATE foo AS f
   SET a     = ba.a,
       c     = ba.c,
       wkd   = ba.wkd
       FROM
       (bar JOIN (SELECT 'Thu' AS wkd)) AS ba
 WHERE f.id = ba.id;

------------------------------------------------------

the table alias works perfectly (both on wkd and on the id column).

It also seems possible to assign an alias to the result of a set operation,
such as UNION (Example uses table "foo" from above):


------------------------------------------------------------

SELECT take_two.a FROM
(SELECT * FROM foo UNION ALL SELECT * FROM foo) AS take_two;

------------------------------------------------------------


also works.

I am now wondering if I am just too stupid to follow obvious logic or 
whether this is a feature that's merely undocumented (or the documentation 
of which I haven't been able to find), or some behaviour that just came 
about unintentionally.

Whatever the cause, the behaviour seems useful to me, but since one should 
never rely on undocumented features, I would like to inquire:

- is this intentional?
- is it realiably supported?
- is it official?

If so, I would appreciate very much finding it in the documentation. I WOULD 
like to use it, but am weary of writing code that may break at some point in 
the future.

Many thanks in advance!

Florian

(2) By Larry Brasfield (LarryBrasfield) on 2020-10-04 18:25:07 in reply to 1 [link] [source]

That this behavior is documented can be seen in [the railroad chart for "table or subquery". So, rely on it and be happy.

(3) By Keith Medcalf (kmedcalf) on 2020-10-04 19:20:12 in reply to 1 [link] [source]

(bar JOIN (SELECT 'Thu' AS wkd)) as ba

is a subquery which is the result of joining the table bar with the table that is the result of the singleton query (select 'Thu' as Wkd). That is, it is the same as the (universally correct) spelling:

((select * from bar) JOIN (select 'Thu' as Wkd)) as ba

Is this valid syntax:

SELECT * FROM (a JOIN (b JOIN c) as d) as e

It certainly is in SQLite3. Is it everywhere? Not a clue. Is this behaviour required by the standard? Not a clue there either.

The root of the issue is when the parser sees the open-parenthesis does it interpret that as commencing a subquery or as merely an order-of-operations marker?

If the former, then parsing will always work correctly. If the latter then the parser, when it sees the ) AS has to go back and "whoa, that open-parenthesis started a subquery so now we have to go back and do it again properly this time".

The SQLite3 parser is a "go forwards single pass" parser and would at that point have no action other than to declare an "error". But it does not because all open-parenthesis where a table-name is required means that we are commencing a subquery.

If the spec requires that (a JOIN b) JOIN c is different than a JOIN (b JOIN C) (ie, that the parenthesis are meaningful) then a bare-table-name is merely "syntactic sugar" for the subquery select * from bare-table-name until all mere "order of operation" parenthesis are replaced by subquery opening parenthesis. Of course, a JOIN b JOIN c may be any of may be any of six possible descent orders depending on whatever the query planner believes is the most efficient order at the time of execution.

Richard might be able to add some meaningful comment here, but I will note that treating the (...) where a table expression is required as an order-of-operations designator rather than a subquery designator both defeats the entire purpose of SQL as a declarative language and makes parsing more difficult.

Yes, there are come people (all from Microsoft) who seem to believe that () always designate order of operations and are in love with putting them everywhere they possibly can, but this is (to be polite) somewhat ill-conceived.

The second form:

SELECT take_two.a FROM
(SELECT * FROM foo UNION ALL SELECT * FROM foo) AS take_two;

is perfectly standards compliant SQL that should be processed by everything claiming to support named subquerys (where compound subsquerys are not unsupported, which I do not think is anything you will find to exist at present).

(4) By anonymous on 2020-10-07 16:35:09 in reply to 2 [link] [source]

Thank you! Unfortunately, however, after having stared hard at this diagram for some time, I still fail to grasp what you mean. 

To stick with the nice railroad metaphor, I'll refer to "tracks" 1 (the topmost route) to to 4 (the bottommost route). What I can see is that:



In track 1, "table-name" is followed by an optional "AS table-alias".

In track 2, "table-function-name (expr, ...)" is followed by an optional "AS table-alias".

In track 3, "(table-or-subquery, ...)", alternatively, "(join-clause)", is not followed by anything.

In track 4, "(select-stmt)" is followed by an optional "AS table-alias".



The track where I would (still, even after thinking hard) expect this behaviour to be documented is track 3. 

What am I overlooking?

(5) By anonymous on 2020-10-07 19:30:46 in reply to 3 [link] [source]

I'll start with the easy thing first: Thank you for poining me to the fact that a compound select statement (what I -- ambiguously -- called a "set operation" above) is actually just an extended subselect. That indeed follows from the railroad diagram, if one doesn't overlook that the main diagram is called select-stmt ...

As to the apparently more tricky question, thank you very much for your thoughtful explanation! It occurs to me almost on a regular basis that in a select statement, table-name is semantically the same as SELECT * FROM table-name, and with your explanations, I can see why it makes sense for a parser to interpret

SELECT foo, bar FROM (table-name)

as exactly the same as

SELECT foo, bar FROM (SELECT * FROM table-name)

(the example is trivial to keep it minimal).

Since I am concerned with whether one can rely on this behaviour, I have (even though your remarks about even you having "no clue" should probably have deterred me) made a feeble attempt to consult the SQL standard about this, only to learn that you have to buy it, which I wasn't prepared to do. It's a really weird thing there is no such thing as a reference implementation (such as the W3C has for HTML up to version 4), given that SQL has been around for so long. But even if the standard said something about this, this would not automatically mean that SQLite follows it.

So, what to do? It would be nice to use that behaviour, as it makes the code more compact, but what the parser does is (while interesting) technically none of the SQL programmer's business. Currently, the conclusion seems to me that the only safe option would be to write such a "named join" as an explicit subselect. Of course, I would be extremely interested if Richard Hipp had any comments on this, so I'll come back to check here.

Thanks very much again! Florian

(6) By Larry Brasfield (LarryBrasfield) on 2020-10-08 18:22:46 in reply to 4 [link] [source]

You overlook nothing; I was too hasty. Upon reexamination of the syntax charts, I have to say that the parse is a bit more subtle than I had thought. I lack the time to elaborate now, but: I think the key is to recognize the equivalence of the "table-or-subquery" and "join-clause" in the update syntax. I intend to see how it parses in the debugger to see whether the construction defined by parse.y maps to what the railroad charts say. (I am pretty sure the charts are generated from the grammar.)

If it is an undocumented feature, it is much more likely to become documented than to vanish due to the project's high valuation of backwards compatibility.