SQLite

View Ticket
Login
Ticket Hash: de3403bf5ae5f72ed6638f6360339254c8c62f53
Title: Column name shift due to LEFT JOIN query flattening
Status: Fixed Type: Code_Defect
Severity: Important Priority: Immediate
Subsystem: Unknown Resolution: Fixed
Last Modified: 2017-07-31 16:42:56
Version Found In: 3.19.0
User Comments:
drh added on 2017-07-29 03:26:32: (text/x-fossil-wiki)
The name of the output column on the SELECT statement below as reported by the
sqlite3_column_name() interface changed in between versions 3.18.0 and 3.19.0.

<blockquote><verbatim>
CREATE TABLE x(a,b);
INSERT INTO x VALUES(1,2);
SELECT x.a FROM x LEFT JOIN (SELECT * FROM x) AS y;
</verbatim></blockquote>

SQLite version 3.18.0 and earlier reports
the column name as "a".
But in SQLite 3.19.0 and 3.20.0, the
column name is reported as "x.a".

The SQLite documentation clearly states that, if there is no AS clause,
the names of output columns are indeterminate, arbitrary, and subject 
to change.  The column names are intended for display purposes only.
And yet, many applications depend on particular column names.
This name shift resulted at least one application error.

The problem was introduced by check-in [41c27bc0ff1d3135], which added a new
optimization for doing query flattening on the right operand of a LEFT JOIN.

<b>Analysis:</b>

Without an AS clause, if the output column is a direct reference to one
of the input table columns, then the output column name will be the same
as the input column name.  Except, if query flattening occurs, the output
column name becomes an exact copy of the input SQL text.

The exception explains the name shift.  Prior to version 3.19.0, the query
flattener would not operate on a LEFT JOIN.  Hence the output column name
was derived from the input column name and was "a".  But once the LEFT JOIN
query flattener optimization was added by check-in [41c27bc0ff1d3135], the
column naming exception mentioned in the previous paragraph kicked in and
caused the column to be named as it was typed in the input SQL:  "x.a".

drh added on 2017-07-29 17:47:12: (text/x-fossil-wiki)
<b>Column Naming Rules</b>

The names of the columns in the result set of a query can take three forms:

  1.  If there is an AS clause, the label to the right of the AS keyword
      becomes the column name.  This has always worked and has never been an
      issue.  When it doubt, the programmer can always put an AS clause onto
      the result expression to define the name of a column.

  2.  If the output column is a direct reference to a column of an input table,
      the the output column name is the name of the column in the input table,
      as it appears in the CREATE TABLE (or CREATE VIEW) statement for the
      input table.  (The previous sentence assumes that PRAGMA short_column_names
      if ON and PRAGMA full_column_names is OFF, which are the default settings.
      As both of those pragmas are deprecated, we will not consider the other
      cases.)  For example, if the table is "CREATE TABLE t1(x)", and if the
      SELECT statement refers to the column of t1 as "x" or "X" or "(X)" or
      "t1.x" or "main.t1.x" or "[main].[t1].[x]" or anything else, the output
      column name will always be just "x", as it appears in the CREATE TABLE
      statement.

  3.  If neither of the above apply, then the column name is a copy of the part
      of the original SQL statement that specified the column.  Examples:
      In the query "SELECT 5+11;" the name of the column is "5+11".  In the
      query "SELECT +x FROM t1" the "+x" is not a direct reference to the t1.x
      column because of the "+" operator, so the name of the output column is "+x".

Columns are named using form (1) if it applies, otherwise
use form (2) if it applies, and finally fall back to form (3) if none of the
other forms apply.

Except, since about 2009, if the query flattener
([https://sqlite.org/optoverview.html#flattening]) is involved, then columns of
the query that would otherwise use form (2) shift to using form (3).

The exception is a hack that was inserted in order to give sensible column
names when working with views.  The problem that the hack aimed to address
is demonstrated by this SQL:

<blockquote><verbatim>
CREATE TABLE x(a,b);
INSERT INTO x VALUES(1,2);
CREATE VIEW y(c,d) AS SELECT a,b FROM x;
SELECT c, d FROM y;
</verbatim></blockquote>

On expects the column names from the final query to be "c" and "d".  But the
query flattener runs in this case and transforms the input query into 
"SELECT a,b FROM x".  With no adjustments, the names of the output columns
for the revised query would be "a" and "b".  The hack to fix this was to
change the column naming rule from (2) to (3) so that the columns would continue
to be called "c" and "d" as they were shown in the input text.

<b>The Problem</b>

That exception to the column naming rules that kicks in when the query flattener
runs is the cause of the bug reported in this ticket.  The original query
used the form "t1.x" to specify the column in the result set.  In SQLite 3.18.0
and earlier, rule (2) above applied and the output column name was just "x".
Then in SQLite 3.19.0, the query flattener was enhanced so that it could be
applied to the query in question, making the query run faster.  But a side-effect
was that the column naming rule shifted from (2) to (3) and the output column
name became "t1.x".

<b>Proposed Solution</b>

The proposed solution (implemented by check-in [09834279aeca3bda]) is to eliminate
the exception to the column naming rules associated with the query flattener.
That means that the query that provoked this ticket would continue to use
column naming rule (2) and would continue to see the column name as "x" as
the application expects.  This change is accomplished by moving the column
naming step ahead of the query flattener step, so that column naming happens
before query flattening, not afterwards.

<b>Possible Problems</b>

The proposed solution makes column names more consistent and easier to understand.
It also makes the implementation smaller and simpler and helps it to run a little
faster.  However, there may be legacy programs that depended on the old behavior
(shifting the column naming rule from (2) to (3) when using the query flattener)
and those legacy programs could potentially break due to this change.