Colons in column names?

(1) By Michael Allman (msa) on 2020-03-29 04:08:01 [link] [source]

Transcript:

[msa@dandy database]$ sqlite3 
SQLite version 3.31.1 2020-01-27 19:55:54
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
sqlite> create table t1(id);
sqlite> create table t2(id, t1Id);
sqlite> insert into t1 values (0); insert into t2 values (1,0);
sqlite> .headers on
sqlite> select t1.id, t2.id from t1 join t2 on t1.id = t2.t1Id;
id|id
0|1
sqlite> select * from (select t1.id, t2.id from t1 join t2 on t1.id = t2.t1Id);
id|id:1
0|1

Can someone tell me why these two queries return different column names? And can I get SQLite to not do that? I'm using a SQLite framework that looks for table columns by their original schema name, and it's not finding the columns with the :1 appended.

Cheers.

(2) By Stephan Beal (stephan) on 2020-03-29 05:07:54 in reply to 1 [link] [source]

The column names are always undefined unless you use the "AS" modifier. You simlly need to use "select x as y" to force sqlite to use the name "y". (Pardon the brevity of my example - writing from a tablet.)

(3) By Keith Medcalf (kmedcalf) on 2020-03-29 07:17:15 in reply to 1 [link] [source]

The column names in the first query are both "id" because by default the alias-name for a value that derives from <table>.<column> is <column>. If the value is an expression, then the default name is the expression.

You can specify an explicit aliasname with the AS keyword. Aliasnames do not have to be unique (but if they are not unique then you cannot uniquely identify a column by its aliasname).

When you use a "subquery" (as in the second example), the result columns must have unique aliasnames otherwise you could not refer to the column unambiguously in the outer query. So the aliasnames are "uniquificated" from left to right by appending :n to duplicates. That is, if a subquery contains 3 fields with the aliasname "id", then they will be called, from left to right, "id", "id:1", "id:2". If this were not the case then the outer query would not be able to access any of the "id" columns by name because the reference would be ambiguous.

This is the case even if you use specified duplicated aliasnames in the query:

sqlite> select t1.id as x, t2.id as x from t1 join t2 on t1.id = t2.t1Id;
x           x
----------  ----------
0           1
sqlite> select * from (select t1.id as x, t2.id as x from t1 join t2 on t1.id = t2.t1Id);
x           x:1
----------  ----------
0           1

The solution is to may sure you use unambiguous aliasnames or make sure that all column names are unique. This is why some people love to prefix the column names of a table with the table name -- it is not because they like the extra typing, it is because of badly designed tools (such as your framework) that make assumptions that are not manifest.

(7) By Michael Allman (msa) on 2020-03-29 10:04:44 in reply to 3 [link] [source]

Hi Keith,

Thanks for your thoughtful reply. I've received many here, actually. :)

It looks like I missed a relevant excerpt from the SQLite SELECT query documentation:

A subquery specified in the table-or-subquery following the FROM clause in a simple SELECT statement is handled as if it was a table containing the data returned by executing the subquery statement.

A table cannot have multiple columns with the same name. In a sense, we seem to be in the land of undefined behavior. Do you know of a piece of documentation that specifies the "uniquification" strategy SQLite uses? I have not found reference to the ":#" convention. Incidentally, I've seen some unusual column names under the circumstances, like id:4294631212.

In fairness to the framework, I am abusing a feature that's meant to be used with select lists from joins. I'm attempting to bend it to subqueries. I just think I can implement my query without subqueries. I'm going to continue looking for a less-than-painful solution.

One final thought. PostgreSQL seems to be okay with duplicate column names from subqueries. Check this out:

http://sqlfiddle.com/#!17/929a1/5/1

I did not take the time to delve deeply into their documentation, but I suspect they simple lay out the column names left to right from the underlying relation.

(8) By Keith Medcalf (kmedcalf) on 2020-03-29 11:56:20 in reply to 7 [link] [source]

Line 2012 in select.c is where the column names are made unique:

    /* Make sure the column name is unique.  If the name is not unique,
    ** append an integer to the name so that it becomes unique.
    */
    cnt = 0;
    while( zName && sqlite3HashFind(&ht, zName)!=0 ){
      nName = sqlite3Strlen30(zName);
      if( nName>0 ){
        for(j=nName-1; j>0 && sqlite3Isdigit(zName[j]); j--){}
        if( zName[j]==':' ) nName = j;
      }
      zName = sqlite3MPrintf(db, "%.*z:%u", nName, zName, ++cnt);
      if( cnt>3 ) sqlite3_randomness(sizeof(cnt), &cnt);
    }
    pCol->zName = zName;

So after having :4 any more duplicates get random numbers:

sqlite> create table x(x);
sqlite> insert into x values (1);
sqlite> select x,x,x,x,x,x.x,x,x,x,x from x;
x           x           x           x           x           x           x           x           x           x
----------  ----------  ----------  ----------  ----------  ----------  ----------  ----------  ----------  ----------
1           1           1           1           1           1           1           1           1           1
sqlite> select * from (select x,x,x,x,x,x.x,x,x,x,x from x);
x           x:1         x:2         x:3         x:4         x:1189230830  x:3251448327  x:2026156180  x:3116667273  x:3179755199
----------  ----------  ----------  ----------  ----------  ------------  ------------  ------------  ------------  ------------
1           1           1           1           1           1             1             1             1             1

Richard can probably answer why this is the case, but I would suspect that it is the old "Doctor, Doctor, it hurts when I poke myself in the eye!" to which the doctor replies "So don't poke yourself in the eye".

Note that the C API does return the correct metadata (column name, origin schema, origin table, origin column) if your framework asks for it. For example my modified APSW which returns objects for rows and retrieves all available column metadata shows the following (note I have to mangle the returned column names so that they are properly formed unique attribute names):

Python 3.8.2 (tags/v3.8.2:7b3ab59, Feb 25 2020, 23:03:10) [MSC v.1916 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import apsw
>>> db = apsw.Connection()
>>> db.execute('create table x(x)')
>>> db.execute('insert into x values (1)')
>>> row = db.execute('select * from (select x,x,x,x,x,x,x,x,x from x)').fetchone()
>>> row
Row(x=1, x1=1, x2=1, x3=1, x4=1, x3711181259=1, x4278608952=1, x3945153751=1, x2452083445=1)
>>> row.x3711181259
1
>>> row._colnames
('x', 'x1', 'x2', 'x3', 'x4', 'x3711181259', 'x4278608952', 'x3945153751', 'x2452083445')
>>> row._orgnames
('x', 'x:1', 'x:2', 'x:3', 'x:4', 'x:3711181259', 'x:4278608952', 'x:3945153751', 'x:2452083445')
>>> row._colorig
('main.x.x', 'main.x.x', 'main.x.x', 'main.x.x', 'main.x.x', 'main.x.x', 'main.x.x', 'main.x.x', 'main.x.x')

_colnames are my mangled attribute names
_orgnames are the original column names reported by SQLite
_colorig are the origin schema.table.column reported by SQLite

And for your original query:

Python 3.8.2 (tags/v3.8.2:7b3ab59, Feb 25 2020, 23:03:10) [MSC v.1916 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import apsw
>>> db = apsw.Connection()
>>> db.execute('create table t1(id);')
>>> db.execute('create table t2(id, t1Id);')
>>> db.execute('insert into t1 values (0); insert into t2 values (1,0);')
>>> row = db.execute('select * from (select t1.id, t2.id from t1 join t2 on t1.id = t2.t1Id);').fetchone()
>>> row
Row(id=0, id1=1)
>>> row._colnames
('id', 'id1')
>>> row._orgnames
('id', 'id:1')
>>> row._colorig
('main.t1.id', 'main.t2.id')

(4.1) By Decker (d3x0r) on 2020-03-29 07:41:18 edited from 4.0 in reply to 1 [link] [source]

Hmm that's an interesting note I thought it might be a property of the shell...

but this is what I got... (shared keys are split by table alias, if different, so [0].t1.id or [0].id.t1 or [0].id[0] are really all the same value.

I tried to test with MySQL(mariadb) and the second query is an error, without an alias...

select * from (select t1.id, t2.id from t1 join t2 on t1.id = t2.t1Id) t;
-- not final 't' added... and then it complains duplicate column id.

mind you these are deprecated https://www.sqlite.org/pragma.html#pragma_full_column_names

(no pragma )
[ { t1: { id: 0 }, t2: { id: 1 }, id: [ 0, 1, t1: 0, t2: 1 ] } ]
[ { id: 0, 'id:1': 1 } ]

#pragma short_column_names=on
[ { t1: { id: 0 }, t2: { id: 1 }, id: [ 0, 1, t1: 0, t2: 1 ] } ]
[ { id: 0, 'id:1': 1 } ]

--pragma short_column_names=off
#pragma long_column_names=on
[ { 't1.id': 0, 't2.id': 1 } ]
[ { 'subquery_1.id': 0, 'subquery_1.id:1': 1 } ]

#pragma short_column_names=on
#pragma long_column_names=on
[ { 't1.id': 0, 't2.id': 1 } ]
[ { id: 0, 'id:1': 1 } ]

(5) By Stephan Beal (stephan) on 2020-03-29 07:47:38 in reply to 1 [link] [source]

Not to confuse the matter further, but you'll likely eventually come across queries formulated like:

select a a, b b, ...

That's semantically identical to:

select a as a, b as b, ...

Because the "AS" keyword is optional. (Including it arguably improves readability.)

Note that even when using "plain" names, like "a" and "b", the name of the result is technically unspecified unless "AS" is used (or implied, as in the first example above). The sqlite docs used to (years ago) have a sentence to that effect, but i'm unable to find it in the SELECT docs.

(6) By Clemens Ladisch (cladisch) on 2020-03-29 08:27:10 in reply to 1 [source]

The SQL-92 standard says:

   6.4  <column reference>

   <column reference> ::= [ <qualifier> <period> ] <column name>

   7.9  <query specification>

   <query specification> ::=
        SELECT [ <set quantifier> ] <select list> <table expression>

   <select list> ::=
          <asterisk>
        | <select sublist> [ { <comma> <select sublist> }... ]

   <select sublist> ::=
          <derived column>
        | <qualifier> <period> <asterisk>

   <derived column> ::= <value expression> [ <as clause> ]

   <as clause> ::= [ AS ] <column name>

   Syntax Rules

   9) Case:

      a) If the i-th <derived column> in the <select list> specifies
        an <as clause> that contains a <column name> C, then the
        <column name> of the i-th column of the result is C.

      b) If the i-th <derived column> in the <select list> does not
        specify an <as clause> and the <value expression> of that
        <derived column> is a single <column reference>, then the
        <column name> of the i-th column of the result is C.

      c) Otherwise, the <column name> of the i-th column of the <query
        specification> is implementation-dependent and different
        from the <column name> of any column, other than itself, of
        a table referenced by any <table reference> contained in the
        SQL-statement.


So when you write `SELECT tab.col FROM ...`, the output column name is
guaranteed even without an AS. But with the subquery, SQLite is forced
to invent a new name.