Not understanding a subquery - Some guidance?
(1.1) By jeffv (jeffvanderdoes) on 2021-08-07 19:26:28 edited from 1.0 [link] [source]
Hi, I'm confused why the following doesn't work as expected and what I might be doing wrong/misunderstanding. -- queries to set up tables r & t create table r (ritm text); create table t (task text, ritm text, ud date); insert into r values ('r1'), ('r2'), ('r3'); insert into t values ('t1', 'r1', '2021-01-01'), ('t2', 'r1', '2021-01-02'), ('t3', 'r1', '2021-01-03'); insert into t values ('t4', 'r2', '2021-01-01'); -- query in question select * from r left outer join t on r.ritm = (select ritm from t where t.ritm = r.ritm order by date(ud) desc limit 1); what I get: r1|t1|r1|2021-01-01 r1|t2|r1|2021-01-02 r1|t3|r1|2021-01-03 r2|t4|r2|2021-01-01 r3||| what I was hoping for: r1|t3|r1|2021-01-03 r2|t4|r2|2021-01-01 r3||| Any ideas? My hope was to mimic an outer apply in sqlserver
No clue what "an outer apply in sqlserver" is.
Perhaps you should use your words?
(4) By jeffv (jeffvanderdoes) on 2021-08-07 23:24:03 in reply to 2 [link] [source]
Thanks Keith for your reply. My hope was to join r to t and pull the first t (if any) joining by ritm in both tables. Does that clarify what I was expecting? Obviously what i got was the joined rows regardless of limit clause. Hopefully I've clarified what I thought the result would be. Thanks, Jeff
Do you mayhaps mean:
-- queries to set up tables r & t drop table t; drop table r; create table r (ritm text); create table t (task text, ritm text, ud date); insert into r values ('r1'), ('r2'), ('r3'); insert into t values ('t1', 'r1', '2021-01-01'), ('t2', 'r1', '2021-01-02'), ('t3', 'r1', '2021-01-03'); insert into t values ('t4', 'r2', '2021-01-01'); with v (task, ritm, ud) as ( select task, ritm, max(ud) from t group by ritm ) select * from r left join v on r.ritm == v.ritm ;
Which, of course, is also this relatively simple select:
select r.ritm, task, t.ritm, max(ud) as ud from r left join t on r.ritm == t.ritm group by r.ritm, t.ritm ;
Maybe you should move the Final close parenthesis to after the
t.ritm = r.ritm
(6) By jeffv (jeffvanderdoes) on 2021-08-07 23:30:38 in reply to 3 [link] [source]
Gerry, Thanks for your reply. By moving final close parenthesis as you mention I get a single row back. My hope is to get three rows where the first join would return the most recent row (order by ud desc). So instead of current getting 5 rows which 4 and 5 are correct but the first three I would only expect 1 (thus the limit 1 clause) so in total I'd expect 3 rows back. Did this make any sense? Thanks, Jeff
Did you test the sub-select ? Does that give you the result you expected/wanted ?
(7) By jeffv (jeffvanderdoes) on 2021-08-08 00:02:17 in reply to 5 [link] [source]
Simon, Thanks for the reply. Interesting thought but sub select refers to outer select....hmmm trying to redo i came up with: select r.ritm, t.task, t.ud from t t left outer join r r on t.ritm = r.ritm limit 1 order by date(t.ud) desc; which gives an error becuase of limit 1 clause so executing without limit clause: select r.ritm, t.task, t.ud from t t left outer join r r on t.ritm = r.ritm order by date(t.ud) desc; which then I get: r1|t3|2021-01-03 r1|t2|2021-01-02 r1|t1|2021-01-01 r2|t4|2021-01-01 which is wrong for two things. One is I only want the first row of t where it matches r on ritm and two I would like to see r3 from r and nulls for fields joining on t. Is there another way to just test the subselect? Thanks, Jeff
The LIMIT clause goes at the end. After DESC.
The way around you put it, SQLite would do the limit first, making it choose one random row of the results first, and only then apply DESC to the one row it has, which would do nothing.
(8) By jeffv (jeffvanderdoes) on 2021-08-08 00:12:42 in reply to 1.1 [link] [source]
All, So this works as expected: select r.ritm, x.ritm, x.task, x.ud from r r left outer join ( select ritm, task, max(ud) ud from t group by ritm ) x on r.ritm = x.ritm but can someone explain why this query doesn't work? select * from r left outer join t on r.ritm = (select ritm from t where t.ritm = r.ritm order by date(ud) desc limit 1); sqlite supports limit clause, left outer joins... does it support correlated subqueries? I thought so. Thoughts? Thanks, Jeff
but can someone explain why this query doesn't work?
But it does work, exactly as coded.
The descent condition specified is True for every combination of r and t tuples so the result is, as expected, the cross join of those two tables.
Except, of course, that the descent condition is false for all r tuples where r.mitm == 'r3` (since there is no t.mitm == 'r3') hence the corresponding output r is a single null tuple.
(9) By Igor Tandetnik (itandetnik) on 2021-08-08 03:48:29 in reply to 1.1 [link] [source]
where t.ritm = r.ritm refers to the first (leftmost) occurrence of the name
t - that is, the outer
t, not the inner
t. The inner select is a correlated subquery, but not in the way you think; for each pair of rows in
t, it devolves to either
select ritm from t where true ... or
select ritm from t where false .... As a result, I believe the query is equivalent simply to
select * from r outer join t on (r.ritm = t.ritm)
You probably meant something like this (not tested):
select * from r left outer join t t1 on t1.task= (select task from t t2 where t2.ritm = r.ritm order by date(ud) desc limit 1);
(14) By jeffv (jeffvanderdoes) on 2021-08-09 14:58:13 in reply to 9 [source]
Thank you... this was enlightening. This query works as expected! Jeff
Which is exactly the same as this query:
select * from r left outer join t on t.task = ( select task from t where t.ritm = r.ritm order by ud desc limit 1 ) ;
which has a requirement that the value of
task is unique. If it is not, it will not work.
ud is already a datetime value (whether a naive ISO string, or some numeric representation of the datetime, so long as it is consistently defined across all rows) the wrapping the date function around
ud does nothing except (a) use CPU time for no useful benefit and (b) create duplicates where none existed.
In the correlated subquery "from t where t.ritm" does not refer to an outer variable. It will only ever refer to the inner table t.
Note that if
task is only unique within
ritm then you would have to use:
select * from r left outer join t on t.ritm = r.ritm and t.task = ( select task from t where t.ritm = r.ritm order by ud desc limit 1 ) ;
and if r(ritm, task) is not unique then these solution methods will not (and cannot be made) to generate the output you claim to want. You would have to depend on a method of projection (shown earlier) that does not have such dependencies.