SQLite

View Ticket
Login
Ticket Hash: de3403bf5ae5f72ed6638f6360339254c8c62f53
Title: Column name shift due to LEFT JOIN query flattening
Status: Fixed Type: Code_Defect
Severity: Important Priority: Immediate
Subsystem: Unknown Resolution: Fixed
Last Modified: 2017-07-31 16:42:56
Version Found In: 3.19.0
User Comments:
drh added on 2017-07-29 03:26:32:

The name of the output column on the SELECT statement below as reported by the sqlite3_column_name() interface changed in between versions 3.18.0 and 3.19.0.

CREATE TABLE x(a,b);
INSERT INTO x VALUES(1,2);
SELECT x.a FROM x LEFT JOIN (SELECT * FROM x) AS y;

SQLite version 3.18.0 and earlier reports the column name as "a". But in SQLite 3.19.0 and 3.20.0, the column name is reported as "x.a".

The SQLite documentation clearly states that, if there is no AS clause, the names of output columns are indeterminate, arbitrary, and subject to change. The column names are intended for display purposes only. And yet, many applications depend on particular column names. This name shift resulted at least one application error.

The problem was introduced by check-in [41c27bc0ff1d3135], which added a new optimization for doing query flattening on the right operand of a LEFT JOIN.

Analysis:

Without an AS clause, if the output column is a direct reference to one of the input table columns, then the output column name will be the same as the input column name. Except, if query flattening occurs, the output column name becomes an exact copy of the input SQL text.

The exception explains the name shift. Prior to version 3.19.0, the query flattener would not operate on a LEFT JOIN. Hence the output column name was derived from the input column name and was "a". But once the LEFT JOIN query flattener optimization was added by check-in [41c27bc0ff1d3135], the column naming exception mentioned in the previous paragraph kicked in and caused the column to be named as it was typed in the input SQL: "x.a".


drh added on 2017-07-29 17:47:12:

Column Naming Rules

The names of the columns in the result set of a query can take three forms:

  1. If there is an AS clause, the label to the right of the AS keyword becomes the column name. This has always worked and has never been an issue. When it doubt, the programmer can always put an AS clause onto the result expression to define the name of a column.
  1. If the output column is a direct reference to a column of an input table, the the output column name is the name of the column in the input table, as it appears in the CREATE TABLE (or CREATE VIEW) statement for the input table. (The previous sentence assumes that PRAGMA short_column_names if ON and PRAGMA full_column_names is OFF, which are the default settings. As both of those pragmas are deprecated, we will not consider the other cases.) For example, if the table is "CREATE TABLE t1(x)", and if the SELECT statement refers to the column of t1 as "x" or "X" or "(X)" or "t1.x" or "main.t1.x" or "[main].[t1].[x]" or anything else, the output column name will always be just "x", as it appears in the CREATE TABLE statement.
  1. If neither of the above apply, then the column name is a copy of the part of the original SQL statement that specified the column. Examples: In the query "SELECT 5+11;" the name of the column is "5+11". In the query "SELECT +x FROM t1" the "+x" is not a direct reference to the t1.x column because of the "+" operator, so the name of the output column is "+x".

Columns are named using form (1) if it applies, otherwise use form (2) if it applies, and finally fall back to form (3) if none of the other forms apply.

Except, since about 2009, if the query flattener (https://sqlite.org/optoverview.html#flattening) is involved, then columns of the query that would otherwise use form (2) shift to using form (3).

The exception is a hack that was inserted in order to give sensible column names when working with views. The problem that the hack aimed to address is demonstrated by this SQL:

CREATE TABLE x(a,b);
INSERT INTO x VALUES(1,2);
CREATE VIEW y(c,d) AS SELECT a,b FROM x;
SELECT c, d FROM y;

On expects the column names from the final query to be "c" and "d". But the query flattener runs in this case and transforms the input query into "SELECT a,b FROM x". With no adjustments, the names of the output columns for the revised query would be "a" and "b". The hack to fix this was to change the column naming rule from (2) to (3) so that the columns would continue to be called "c" and "d" as they were shown in the input text.

The Problem

That exception to the column naming rules that kicks in when the query flattener runs is the cause of the bug reported in this ticket. The original query used the form "t1.x" to specify the column in the result set. In SQLite 3.18.0 and earlier, rule (2) above applied and the output column name was just "x". Then in SQLite 3.19.0, the query flattener was enhanced so that it could be applied to the query in question, making the query run faster. But a side-effect was that the column naming rule shifted from (2) to (3) and the output column name became "t1.x".

Proposed Solution

The proposed solution (implemented by check-in [09834279aeca3bda]) is to eliminate the exception to the column naming rules associated with the query flattener. That means that the query that provoked this ticket would continue to use column naming rule (2) and would continue to see the column name as "x" as the application expects. This change is accomplished by moving the column naming step ahead of the query flattener step, so that column naming happens before query flattening, not afterwards.

Possible Problems

The proposed solution makes column names more consistent and easier to understand. It also makes the implementation smaller and simpler and helps it to run a little faster. However, there may be legacy programs that depended on the old behavior (shifting the column naming rule from (2) to (3) when using the query flattener) and those legacy programs could potentially break due to this change.