SQLite Forum

Help understanding LEFT JOIN
Login

Help understanding LEFT JOIN

(1) By jose isaias cabrera (jicman) on 2021-02-19 19:27:20 [link]

Apologies for the long post, but the scenario is a bit complicated (to me :-)), I had to create these tables to show the problem I am having on a system I created.

What I am trying to do is to get a series of LEFT JOINs to bring me the correct response. Please imagine the following:

```
create table zMList (id, pid, a, yyyy, c, d, idate);
insert into zMList values (1,'p001', 10, 2019, 'n', 4, '2019-02-11');
insert into zMList values (2,'p002', 25, 2019, 'n', 4, '2019-02-11');
insert into zMList values (3,'', 32, 2019, 'n', 4, '2019-02-11');
insert into zMList values (4,'p004', 64, 2019, 'y', 4, '2019-02-11');
insert into zMList values (5,'p005', 35, 2019, 'y', 4, '2019-02-11');
insert into zMList values (1,'p001', 10, 2020, 'n', 4, '2019-02-12');
insert into zMList values (2,'p002', 2, 2019, 'n', 4, '2019-02-12');
insert into zMList values (3,'', 13, 2019, 'y', 4, '2019-02-12');
insert into zMList values (4,'p004', 44, 2019, 'y', 4, '2019-02-12');
insert into zMList values (1,'p001', 10, 2020, 'n', 4, '2019-02-13');
insert into zMList values (2,'p002', 82, 2019, 'n', 4, '2019-02-13');
insert into zMList values (3,'', 93, 2020, 'y', 4, '2019-02-13');
insert into zMList values (4,'p004', 45, 2020, 'n', 4, '2019-02-13');
insert into zMList values (5,'p005',75, 2020, 'y', 8, '2019-02-13');

create table zProjs (id, pid, g, h, i, j, idate);
insert into zProjs values (1,’p001', 1, 4, 'n', 4, '2019-02-11');
insert into zProjs values (2,’p002', 2, 3, 'n', 4, '2019-02-11');
insert into zProjs values (4,’p004', 4, 5, 'y', 4, '2019-02-11');
insert into zProjs values (5,’p005', 5, 3, 'y', 4, '2019-02-11');

create table zImport (id, nn, yyyy, c, d, idate);
insert into zImport values (1, 1, 2019, 'n', 4, '2019-02-11');
insert into zImport values (2, 7, 2019, 'n', 4, '2019-02-11');
insert into zImport values (4, 4, 2019, 'y', 4, '2019-02-11');
insert into zImport values (5, 5, 2019, 'y', 4, '2019-02-11');
insert into zImport values (1, 10, 2020, 'n', 4, '2019-02-12');
insert into zImport values (2, 2, 2019, 'n', 4, '2019-02-12');
insert into zImport values (4, 4, 2019, 'y', 4, '2019-02-12');
insert into zImport values (1, 10, 2020, 'n', 4, '2019-02-13');
insert into zImport values (2, 6, 2019, 'n', 4, '2019-02-13');
insert into zImport values (4, 9, 2020, 'n', 4, '2019-02-13');
insert into zImport values (5, 8, 2020, 'y', 8, '2019-02-13');

create table zdocs (id, dn, link, idate);
insert into zdocs values (1,'p001.xls', 'http://xls.com/p001.xls', '2019-02-11');
insert into zdocs values (1,'p001-a.xls', 'http://xls.com/p001a.xls', '2019-02-12');
insert into zdocs values (1,'p001-b.xls', 'http://xls.com/p001b.xls', '2019-02-13');
insert into zdocs values (4,'p004a.xls', 'http://xls.com/p003a.xls', '2019-02-22');
insert into zdocs values (4,'p004b.xls', 'http://xls.com/p003b.xls', '2019-02-23');
insert into zdocs values (5,'p005.xls', 'http://xls.com/p005.xls', '2019-02-11');

create table t2 (pid, WYear, co, amt, indate);
insert into t2 values ('p001', 2019,'aa', 100.0, '2019-02-13');
insert into t2 values ('p001', 2019,'ab', 100.0, '2019-02-13');
insert into t2 values ('p001', 2019,'ac', 100.0, '2019-02-13');
insert into t2 values ('p004', 2019,'d', 100.0, '2019-02-13');
insert into t2 values ('p002', 2020,'c', 100.0, '2019-02-13');
insert into t2 values ('p005', 2020,'a', 100.0, '2019-02-13');
insert into t2 values ('p005', 2020,'a', 100.0, '2019-02-13');
insert into t2 values ('p001', 2020,'aa', 100.0, '2019-02-14');
insert into t2 values ('p001', 2020,'ab', 100.0, '2019-02-14');
insert into t2 values ('p001', 2020,'ac', 100.0, '2019-02-14');
```

When I run this SELECT,

`
SELECT a.id, a.pid, a.yyyy, b.i, c.nn, d.dn, sum(e.amt)
FROM zMList a
    LEFT JOIN zProjs b ON a.id = b.id 
    LEFT JOIN zImport c ON a.id = c.id
    LEFT JOIN zdocs d ON a.id = d.id 
    LEFT JOIN t2 e ON a.pid = e.pid
WHERE 
    a.yyyy = 2020
    AND a.idate = (SELECT MAX(idate) from zMList where id = a.id)
    AND c.idate = (SELECT MAX(idate) from zImport where id = c.id)
    AND d.idate = (SELECT MAX(idate) from zdocs where id = d.id)
    AND e.indate = (SELECT MAX(indate) from t2 where pid = e.pid)
GROUP BY a.pid
;
`

I get,

`
1|p001|2020|n|10|p001-b.xls|300.0
4|p004|2020|y|9|p004b.xls|100.0
5|p005|2020|y|8|p005.xls|200.0
`

and I am expecting,

`
1|p001|2020|n|10|p001-b.xls|300.0
3||2020||||
4|p004|2020|y|9|p004b.xls|100.0
5|p005|2020|y|8|p005.xls|200.0
`

I have tried a bunch of other SELECTs to try to get it to show, but, just the first list with 3 records are coming out. I thought that if the first table listed on the FROM piece of the statement contained all the records that I want to display, and LEFT JOIN'ed it with other tables, all the records that are part of the first table will be displayed with empty values for the rest of the tables. Any thoughts?  Thanks.

josé

(2) By Keith Medcalf (kmedcalf) on 2021-02-19 20:11:19 in reply to 1

You have failed to account for NULL.

The conditions in the WHERE clause for tables zImport as c, zdocs as d, and t2 as e referencing the idate and id fields in each of those respectively do not permit the idate field of that table to be null.

This causes the LEFT JOIN to be a misstatement of an INNER JOIN.

However, if the LEFT JOIN is actually performed, then all the columns of the RHS table will be NULL.  And a test for equality where one of the arguments is NULL, is false.

So in your WHERE clause you need to change all the = to IS so that comparisons between nulls become TRUE rather than FALSE.

ie:

```
SELECT a.id, a.pid, a.yyyy, b.i, c.nn, d.dn, sum(e.amt)
FROM zMList a
    LEFT JOIN zProjs b ON a.id = b.id 
    LEFT JOIN zImport c ON a.id = c.id
    LEFT JOIN zdocs d ON a.id = d.id 
    LEFT JOIN t2 e ON a.pid = e.pid
WHERE 
    a.yyyy = 2020
    AND a.idate IS (SELECT MAX(idate) from zMList where id IS a.id)
    AND c.idate IS (SELECT MAX(idate) from zImport where id IS c.id)
    AND d.idate IS (SELECT MAX(idate) from zdocs where id IS d.id)
    AND e.indate IS (SELECT MAX(indate) from t2 where pid IS e.pid)
GROUP BY a.pid
;
```

(3) By jose isaias cabrera (jicman) on 2021-02-19 20:50:17 in reply to 2 [link]

Keith, in my country we use the term, "tu eres un toro!", which translates to "you are a bull!".  This is in all sense of the statement an exaltation to the person that it's being referred to, and in this case, to your knowledge. So, in other words, you're amazing. :-)  Thanks.