SQLite Forum

Not understanding a subquery - Some guidance?
Login

Not understanding a subquery - Some guidance?

(1.1) By jeffv (jeffvanderdoes) on 2021-08-07 19:26:28 edited from 1.0 [link] [source]

Hi,

I'm confused why the following doesn't work as expected and what I might be doing wrong/misunderstanding.


-- queries to set up tables r & t
create table r (ritm text);
create table t (task text, ritm text, ud date);
insert into r values ('r1'), ('r2'), ('r3');
insert into t values ('t1', 'r1', '2021-01-01'), ('t2', 'r1', '2021-01-02'), ('t3', 'r1', '2021-01-03');
insert into t values ('t4', 'r2', '2021-01-01');

-- query in question
select * from r
left outer join t on r.ritm =
(select ritm from t where t.ritm = r.ritm order by date(ud) desc limit 1);

what I get:
r1|t1|r1|2021-01-01
r1|t2|r1|2021-01-02
r1|t3|r1|2021-01-03
r2|t4|r2|2021-01-01
r3|||

what I was hoping for:
r1|t3|r1|2021-01-03
r2|t4|r2|2021-01-01
r3|||

Any ideas?  My hope was to mimic an outer apply in sqlserver

(2) By Keith Medcalf (kmedcalf) on 2021-08-07 20:50:49 in reply to 1.1 [link] [source]

No clue what "an outer apply in sqlserver" is.

Perhaps you should use your words?

(4) By jeffv (jeffvanderdoes) on 2021-08-07 23:24:03 in reply to 2 [link] [source]

Thanks Keith for your reply.

My hope was to join r to t and pull the first t (if any) joining by ritm in both tables.  Does that clarify what I was expecting?  Obviously what i got was the joined rows regardless of limit clause.

Hopefully I've clarified what I thought the result would be.

Thanks,
Jeff

(10) By Keith Medcalf (kmedcalf) on 2021-08-08 07:05:30 in reply to 4 [link] [source]

Do you mayhaps mean:

-- queries to set up tables r & t
drop table t;
drop table r;
create table r (ritm text);
create table t (task text, ritm text, ud date);
insert into r values ('r1'), ('r2'), ('r3');
insert into t values ('t1', 'r1', '2021-01-01'), ('t2', 'r1', '2021-01-02'), ('t3', 'r1', '2021-01-03');
insert into t values ('t4', 'r2', '2021-01-01');

with v (task, ritm, ud)
  as (
         select task, ritm, max(ud)
          from t
      group by ritm
     )
   select *
     from r
left join v
       on r.ritm == v.ritm
;

(11) By Keith Medcalf (kmedcalf) on 2021-08-08 07:15:25 in reply to 10 [link] [source]

Which, of course, is also this relatively simple select:

   select r.ritm,
          task,
          t.ritm,
          max(ud) as ud
     from r 
left join t 
       on r.ritm == t.ritm
 group by r.ritm, t.ritm
;

(3) By Gerry Snyder (GSnyder) on 2021-08-07 23:03:15 in reply to 1.1 [link] [source]

Maybe you should move the Final close parenthesis to after the

t.ritm = r.ritm

(6) By jeffv (jeffvanderdoes) on 2021-08-07 23:30:38 in reply to 3 [link] [source]

Gerry,

Thanks for your reply.

By moving final close parenthesis as you mention I get a single row back.  My hope is to get three rows where the first join would return the most recent row (order by ud desc).  So instead of current getting  5 rows which 4 and 5 are correct but the first three I would only expect 1 (thus the limit 1 clause) so in total I'd expect 3 rows back.

Did this make any sense?

Thanks,
Jeff

(5) By Simon Slavin (slavin) on 2021-08-07 23:26:07 in reply to 1.1 [source]

Did you test the sub-select ? Does that give you the result you expected/wanted ?

(7) By jeffv (jeffvanderdoes) on 2021-08-08 00:02:17 in reply to 5 [link] [source]

Simon,

Thanks for the reply.

Interesting thought but sub select refers to outer select....hmmm trying to redo i came up with:

select r.ritm, t.task, t.ud from t t 
left outer join r r on t.ritm = r.ritm limit 1
order by date(t.ud) desc;

which gives an error becuase of limit 1 clause so executing without limit clause:

select r.ritm, t.task, t.ud from t t 
left outer join r r on t.ritm = r.ritm
order by date(t.ud) desc;

which then I get:

r1|t3|2021-01-03
r1|t2|2021-01-02
r1|t1|2021-01-01
r2|t4|2021-01-01

which is wrong for two things.  One is I only want the first row of t where it matches r on ritm and two I would like to see r3 from r and nulls for fields joining on t.

Is there another way to just test the subselect?

Thanks,
Jeff

(13) By Simon Slavin (slavin) on 2021-08-08 12:33:15 in reply to 7 [link] [source]

The LIMIT clause goes at the end. After DESC.

The way around you put it, SQLite would do the limit first, making it choose one random row of the results first, and only then apply DESC to the one row it has, which would do nothing.

(8) By jeffv (jeffvanderdoes) on 2021-08-08 00:12:42 in reply to 1.1 [link] [source]

All,

So this works as expected:

select r.ritm, x.ritm, x.task, x.ud
from r r 
left outer join 
(
   select ritm, task, max(ud) ud
   from t
   group by ritm
) x on r.ritm = x.ritm

but can someone explain why this query doesn't work?

select * from r
left outer join t on r.ritm =
(select ritm from t where t.ritm = r.ritm order by date(ud) desc limit 1);

sqlite supports limit clause, left outer joins... does it support correlated subqueries?  I thought so.

Thoughts?

Thanks,
Jeff

(12) By Keith Medcalf (kmedcalf) on 2021-08-08 11:36:05 in reply to 8 [link] [source]

but can someone explain why this query doesn't work?

But it does work, exactly as coded.

The descent condition specified is True for every combination of r and t tuples so the result is, as expected, the cross join of those two tables.

(16) By Keith Medcalf (kmedcalf) on 2021-08-09 23:04:40 in reply to 12 [link] [source]

Except, of course, that the descent condition is false for all r tuples where r.mitm == 'r3` (since there is no t.mitm == 'r3') hence the corresponding output r is a single null tuple.

(9) By Igor Tandetnik (itandetnik) on 2021-08-08 03:48:29 in reply to 1.1 [link] [source]

t in where t.ritm = r.ritm refers to the first (leftmost) occurrence of the name t - that is, the outer t, not the inner t. The inner select is a correlated subquery, but not in the way you think; for each pair of rows in r and t, it devolves to either select ritm from t where true ... or select ritm from t where false .... As a result, I believe the query is equivalent simply to select * from r outer join t on (r.ritm = t.ritm)

You probably meant something like this (not tested):

select * from r
left outer join t t1 on t1.task=
(select task from t t2 where t2.ritm = r.ritm order by date(ud) desc limit 1);

(14) By jeffv (jeffvanderdoes) on 2021-08-09 14:58:13 in reply to 9 [link] [source]

Thank you... this was enlightening.   This query works as expected!

Jeff

(15) By Keith Medcalf (kmedcalf) on 2021-08-09 22:57:38 in reply to 9 [link] [source]

Which is exactly the same as this query:

         select * 
           from r
left outer join t 
             on t.task = (
                          select task 
                            from t 
                           where t.ritm = r.ritm 
                        order by ud desc 
                           limit 1
                         )
;

which has a requirement that the value of task is unique. If it is not, it will not work.

if ud is already a datetime value (whether a naive ISO string, or some numeric representation of the datetime, so long as it is consistently defined across all rows) the wrapping the date function around ud does nothing except (a) use CPU time for no useful benefit and (b) create duplicates where none existed.

In the correlated subquery "from t where t.ritm" does not refer to an outer variable. It will only ever refer to the inner table t.

Note that if task is only unique within ritm then you would have to use:

         select * 
           from r
left outer join t 
             on t.ritm = r.ritm
            and t.task = (
                          select task 
                            from t 
                           where t.ritm = r.ritm 
                        order by ud desc 
                           limit 1
                         )
;

and if r(ritm, task) is not unique then these solution methods will not (and cannot be made) to generate the output you claim to want. You would have to depend on a method of projection (shown earlier) that does not have such dependencies.