Lateral Join/Coroutine Bug, Suggestions for Why Lateral Should be Added
(1.1) By SeverKetor on 2024-07-22 04:13:24 edited from 1.0 [source]
I'm probably jumping the gun on this and maybe it's already known (or simply upcoming on the things to try list), but if you use an aggregate or window function in the lateral portion, and there is more than one row, and you reference an outside column in either the output rows or from clause, you get a segfault.
Works:
SELECT * FROM (VALUES (1)) JOIN LATERAL (select COUNT(*), column1);
SELECT * FROM (VALUES (1), (2)) JOIN LATERAL (select COUNT(*));
Segfaults:
SELECT * FROM (VALUES (1), (2)) JOIN LATERAL (select COUNT(*), column1);
Also as a bonus, I found another bug/oddity:
SELECT * FROM (VALUES (1)) JOIN LATERAL (select 1 LIMIT column1);
Parse error: no such column: column1
I tried to test this bit in Postgres, but couldn't get it to parse at all. It kept complaining of a syntax error at the semi-colon no matter what I tried.
With my whinging complete, thank you for working on this new feature. Looks like it could be quite handy in the future.
(2) By Richard Hipp (drh) on 2024-07-21 09:14:46 in reply to 1.0 [link] [source]
That's a good catch. Thanks for reporting it.
The issue is an adverse interaction between LATERAL and the new "VALUES-as-coroutine" optimization that was added in the previous release (item 5a in the release notes.
The current implementation of LATERAL is unable to interoperate with coroutines in subqueries to its left. All FROM clause elements to the left of the LATERAL must be actual tables, either tables in the database or materializations of subqueries or views. In the prototype LATERAL implementation, I work around this limitation by simply disabling the use of coroutines when LATERAL appears anywhere in the FROM clause.
However, in the VALUES-as-coroutine optimization that strategy won't work, because the coroutine that implements the VALUES clause is generated prior to the parser even reaching the LATERAL keyword.
So it appears that I'll need to rework the implementation of LATERAL so that it is able to interoperate with coroutines. That's easier said than done. It might well involve changes to the way coroutines are implemented, not just a change to the new LATERAL implementation.
Summary: I have your report. I will try to fix it. But doing so will take some time.
This is why the version control system has "branches" and why we use those branches for new features. :-)
(3) By Richard Hipp (drh) on 2024-07-21 10:39:04 in reply to 2 [link] [source]
As of the latest checkin on the branch, LATERAL subqueries are now able to referenced other subqueries to their left that are implemented as co-routines. This resolves the crash reported above. (The fix was easier than I thought it would be.)
I have not yet looked into the LIMIT problem, as that is more of a limitation than a bug. I'll consider that issue separately.
(4) By Richard Hipp (drh) on 2024-07-21 19:50:24 in reply to 1.0 [link] [source]
Here's a different but similar problem that I need to fix:
CREATE TABLE t1(a,b); INSERT INTO t1 VALUES(1,2); CREATE TABLE t2(c,d,e,f); INSERT INTO t2 VALUES(3,4,5,6); CREATE INDEX t2cd ON t2(c,d); SELECT c, a, sb FROM t2 JOIN LATERAL( SELECT a, sum(b) AS sb FROM t1 GROUP BY a HAVING sum(b)<d ) AS lx ON true ORDER BY a, c;
In this case, the t2 table which is to the left of the LATERAL subquery uses a covering index. The table itself is never opened. But the LATERAL subquery does not know that, and so tries to read the "d" column from the (unopened) table, rather than from the covering index.
(5) By Richard Hipp (drh) on 2024-07-22 00:59:20 in reply to 4 [link] [source]
All issues reported above have now been addressed, including both concerns on (1) and the new find at (4).
The Big Remaining Question:
Should this LATERAL query enhancement be merged into trunk and added to the next release? Or is LATERAL just an extra complication that doesn't really have a practical application? Is support for LATERAL just about being buzz-word compliant, or is LATERAL actually useful for something?
My default action will be to leave the LATERAL join optimization to languish on the branch. If you thinking having LATERAL is important, then try to convince me otherwise. Your best approach would be do demonstrate a real-world problem that LATERAL solves that would be difficult to resolve without LATERAL.
(6) By Roman (moskvich412) on 2024-07-22 02:15:43 in reply to 5 [link] [source]
Dear SQLiters,
I have no idea what LATERAL does, could grasp it on the first reading, but I am fascinated by Richard! He asks for usefulness of LATERAL upfront (I presume query can be written without it as most can without WITH but not without WITH RECURSIVE ), goes through the trouble of implementing it and then debugging in less than a day and after all that not putting into production !
Richard is curious and wants SQLite to be useful. Mental exercise is never a waste of time even if something does not go into production.
I am fascinated by all SQLite developers ! I am extremely thankful that SQLite exists. It makes my work so much easier.
Roman
(7) By SeverKetor on 2024-07-22 04:11:37 in reply to 5 [link] [source]
In my personal opinion I'd like to see it added (always nice to have an extra tool in you back pocket for when you need it), but after going over my own code it's not as applicable as I would hope (I found one query that generates some statistics which could use LATERAL to be more easily understood).
Looking online it seems people mainly suggest using LATERAL (or OUTER/CROSS APPLY for SQL Server) for a couple common things.
Deduplicate calculations without needing to nest subqueries. If you have calculated values being used to calculate other values, you can calculate the intermediate values that are used more than once in a LATERAL join (or if needed, multiple consecutive LATERAL joins referencing values calculated in previous ones) and reference that. You could have the length/width/height of whatever in a table and have to calculate several related things that use volume.
Join a table but limit the number of rows produced per row of the left table. Could want to get users' top 3 movies, assuming there's a users table and a movie ratings table, so you use a LATERAL join to select using userid ordered by rating limit 3, to get user A's favourites, user B's, etc.. Certainly doable without LATERAL but uglier, trickier, and potentially significantly slower.
Also it gets used for table valued functions, but SQLite already lets you do stuff like JOIN json_each(table.column). Adding LATERAL would I guess just be extending the existing implicit LATERAL joins SQLite does.
Overall I want it, but I think I'd be lying if I said it was important to have. It's like RIGHT/FULL joins; they're nice but I just rarely (but not never!) find myself writing a query that benefits from having it. Perhaps the reasoning you had for adding RIGHT/FULL joins would apply here as well, assuming it wasn't a consortium member making a request.
(8) By ddevienne on 2024-07-22 08:53:59 in reply to 5 [link] [source]
If SQLite's LATERAL
is compatible with PostgreSQL's, then that's a big win for the subset of people (like me) that use both PostgreSQL and SQLite, to have the same mental model and SQL for queries.
We have two implementations for interacting with data, using a legacy proprietary DB-like API, and a new PostgreSQL-based (thus SQL-based) one.
Long term, I'd like an SQLite-based 3rd impl (to work with local projects, when offline or for performance), and the closer SQLite's SQL is to PostgreSQL's, the better in my book.
That's admittedly perhaps a niche justification, but it matters to me at least. Thanks, --DD
(10) By Richard Hipp (drh) on 2024-07-22 13:30:02 in reply to 8 [link] [source]
PG accepts "LATERAL" before a table-valued function, though the PG documentation admits that this is just a "noise" keyword and has no effect, since the same table-valued function will do exactly the same thing without the LATERAL prefix. SQLite does not accept "LATERAL" before a table-valued function, since doing so would mean unnecessary complication in the parser and the noise keyword adds no value.
Other than that, I think the SQLite and PG implementations are compatible. Your help if verifying that assertion is appreciated.
(9) By Dave Mausner (dmausner) on 2024-07-22 13:25:30 in reply to 5 [link] [source]
After Googling LATERAL, I realized that I could have used this concept in a project I delivered just days ago. I required a complicated elapsed time value to be used in several expressions in one report.
The issue was, mainly, avoidance of writing that long expression several times. My solution was to create a succession of three views, each one supplying at least one complicated expression value needed in the next view. I then joined the views on rowid.
All this did was force Sqlite to create the long, messy query internally. Well, that's what computers are for.
I guess that in this case, I would have chosen LATERAL if it existed.
(11) By Daniel Steigerwald (steida) on 2024-10-05 14:37:00 in reply to 5 [link] [source]
Hi. Thank you for asking. I think it would be helpful for Evolu if I understood what it does correctly. I use CTEs a lot, and often, I need an aggregation as a subquery, but it can return only one value, so I have to repeat subqueries like that:
p(l, t, h1, h2) as (
select (select max(l) + 1 from message), null, null, null
union all
select p.l - 1, ifnull((
select t from message
where l = p.l - 1 and t > $t and (p.t is null or t < p.t) order by t limit 1
), p.t), (
select h1 from message
where l = p.l - 1 and t > $t and (p.t is null or t < p.t) order by t limit 1
), (
select h2 from message
where l = p.l - 1 and t > $t and (p.t is null or t < p.t) order by t limit 1
)
from p where p.l > 2
),
(13.1) By Daniel Steigerwald (steida) on 2024-10-06 09:46:12 edited from 13.0 in reply to 11 [link] [source]
I was searching for lateral join for SQLite and found this thesis :-)
https://db.cs.uni-tuebingen.de/theses/2021/jonatan-braun/thesis-braun-2021.pdf
Abstract: The popular RDBMS SQLite is held back by the lack of support for some modern database computation techniques. To adress parts of this issue this work presents an approach to add lateral joins to its list of features. This addition provides new possibilities for SQL queries and could make SQLite ready for fast in-database computation tools.
(14) By anonymous on 2024-10-06 16:59:08 in reply to 13.1 [link] [source]
interesting.
The ability to flatten without the FROM (around pg 26?)may be useful even if LATERAL is not pursued. The question i have with this in particular is whether that gives correct results in all cases. If not, then the question is whether that LATERAL implementation becomes too inefficient without it (and thus impractical ).
(12) By DrkShadow (drkshadow) on 2024-10-06 01:31:39 in reply to 5 [link] [source]
A couple days ago, a complex peformance issue came up: https://sqlite.org/forum/forumpost/66e07e0f5a
My solution is to calculate all possible scores for all possible things (once), as that's possibly better than running a nested subquery in a loop for each row.
However, using a LATERAL join here would be the best of all worlds: for each desired row, only, calculate the score; then, for any duplicate scores, calculate the winning one. The gain is that instead of calculating all scores for everything, including not-used things, including the case of choosing what to use in the equal-scores case, with the LATERAL JOIN you're just calculating the score for that current "thing" -- only the necessary "things" and potentially with a much more concise "equal-score" case.
(15.1) Originally by anonymous with edits by Richard Hipp (drh) on 2024-10-25 16:21:40 from 15.0 in reply to 5 [link] [source]
This is a simplified version of a FIFO inventory management issue, where longer expressions calculate how remaining quantities from previous purchases match with each sale.
Problem: in a recursive CTE, we need column B to depend on column A of the same row, while A depends on values from the previous row.
WITH RECURSIVE series(Id, A, B) AS ( SELECT 1, 10, 10 * 2 UNION ALL SELECT Id + 1, prev.B + 5 as A, -- A depends on previous B (prev.B + 5) * 2 as B -- B depends on current A (calculation repeated) FROM series prev WHERE Id < 5 ) SELECT * FROM series;
What makes this particularly challenging is that we can't split this into multiple CTEs or use subqueries, as the recursive table must appear exactly once in the FROM clause. Without LATERAL, we must repeat the calculation of A in B's expression.
DRH edit: fix formatting
(16) By Richard Hipp (drh) on 2024-10-25 16:25:29 in reply to 15.1 [link] [source]
How would you rewrite this query if you had LATERAL JOIN?
Can you pull down the latest code from the "lateral-join" branch (https://sqlite.org/src/tarball/lateral-join/sqlite-lateral-join.tar.gz) and compile it (https://sqlite.org/forum/forumpost/8aa172844fdac249) and then use the result to verify that your proposed use-case for Lateral Join actually works?