Ambiguous 'ambiguous error', is it normal or bug?

(1.1) By Tony Papadimitriou (tonyp) on 2020-05-01 09:28:11 edited from 1.0 [link] [source]

create table t(s);
select * from t a join t b order by s;            -- no error
select * from t a join t b order by length(s);    -- ambiguous column name: s

Why an error for the second case and not the first?

(2) By Keith Medcalf (kmedcalf) on 2020-05-01 09:33:25 in reply to 1.1 [link] [source]

Because the first one is a bug.

(3.1) By TripeHound on 2020-05-01 11:21:48 edited from 3.0 in reply to 2 [link] [source]

Interestingly (for some definition of "interesting"):

select 1 from t a join t b order by s;

also gives Error: ambiguous column name: s (at least in 3.30.1).

(4) By Ryan Smith (cuz) on 2020-05-01 11:35:40 in reply to 3.1 [source]

These are not that strange.

When a dataset is made, ordering happens on the produced dataset. In the case of

SELECT * FROM t a JOIN t b ORDER BY s;

You first get a full dataset (or however it is named internally) that has one s column (in the same way if it was aliased), so ordering the dataset by the s column is pretty easy and probably the reason why the QP doesn't trip up on it, and most importantly, MUST produce the correct result.

Trying to do:

SELECT 1 FROM t a JOIN t b ORDER BY s;

Produces no such s column in the produced dataset, so the ordering has to dig into a or b to find the referenced column to order by, but now it appears in both, and so it cannot assume and reports the ambiguity.

However, if my explanation holds water in any way - it seems that ... ORDER BY length(s); in the original example should also work. It probably trips up somewhere in the maze of decisions the QP has to check/make.

I won't call it a bug until one can show the SQL produced the wrong output. An error NOT produced where the context makes sense, yet other DBs would throw an error, is not a bug, it's a feature. Until of course the SQL result data is wrong.

(5) By Keith Medcalf (kmedcalf) on 2020-05-02 13:10:14 in reply to 4 [link] [source]

This would be correct if the query were thus:

sqlite> select * from (select * from t a join t b) order by s;
s           s:1
----------  ----------
1           1
1           2
1           3
2           1
2           2
2           3
3           1
3           2
3           3

However for a query which is thus:

sqlite> select * from t a join t b order by s;
s           s
----------  ----------
1           1
1           2
1           3
2           1
2           2
2           3
3           1
3           2
3           3

it should return an error because "s" is ambiguous. Unless of course the name of the duplicate column "s" were "s:1" and not "s". The only case in which "s" would not be ambiguous would be if the operation where constrained WHERE a.s IS b.s

Which "s" on which table the query planner might choose to put in an outer loop relating to the other "s" in "undefined" and as such the reference should be ambiguous.

Now if you say'd "order by 1" then it would be clear whichever value was first was what you intended, however, specifying "s" means that you could order by a.s or b.s which is clearly ambiguous.