Recursive query to find the shortest path

(1) By Balaji Ramanathan (balaji) on 2021-05-01 17:31:57 [link] [source]

Hi,

I have written the following query to find the shortest path between two points in a travel log (table trip contains a bunch of trips between different points, and nodes and edges are created from this table).  I am able to run this with the LIMIT 1000 or LIMIT 10000 line to stop the recursion, but what I would really like to do is make sure that once I find a path from the origin to the destination, I cut off any path that is longer than the path I have already found (because there is no chance that will be shortest path since all the distances are positive).  But when I try to create another WHERE condition that refers to ShortestPath (commented out in the code below), I get an error about multiple references to the recursive query.

What is the correct way to get this done?  As you can tell, I am somewhat new to recursive queries and I have gone through the examples on the SQLite site (which is how I managed to get at least as far as I have managed to, so thank you very much for the effort that went into putting that very informative page together), but this has stumped me. Thank you.

WITH RECURSIVE OD ( Origin, Destination ) AS ( VALUES ( 'Some City', 'Some Other City' ) ), edges ( startpoint, endpoint, distance ) AS ( SELECT Places1.PlaceName AS startpoint, Places2.PlaceName AS endpoint, min(distance) AS distance FROM trip INNER JOIN Place Places1 ON Places1.PlaceID = trip.Origin INNER JOIN Place Places2 ON Places2.PlaceID = trip.Destination GROUP BY startpoint, endpoint ), ShortestPath ( O, D, OtoD, ShortestDistance ) AS ( SELECT OD.origin AS O, edges.endpoint AS D, OD.origin || '-' AS OtoD, edges.Distance AS ShortestDistance FROM edges INNER JOIN OD ON edges.startpoint = OD.Origin UNION ALL SELECT ShortestPath.O AS O, edges.endpoint AS D, ShortestPath.OtoD || ShortestPath.D || '-' AS OtoD, ShortestPath.ShortestDistance + edges.distance AS ShortestDistance FROM ShortestPath INNER JOIN edges ON ShortestPath.D = edges.startpoint WHERE instr(OtoD, edges.endpoint) = 0 /* and ShortestDistance <= (select min(ShortestDistance) FROM ShortestPath INNER JOIN OD ON ShortestPath.O = OD.origin AND ShortestPath.D = OD.destination)*/ limit 10000 ) SELECT O, OtoD || D as Path, D, ShortestDistance FROM ShortestPath INNER JOIN OD ON ShortestPath.O = OD.origin AND ShortestPath.D = OD.destination ORDER BY ShortestDistance Limit 1;

(2) By Balaji Ramanathan (balaji) on 2021-05-01 22:34:49 in reply to 1 [link] [source]

This is a very curious observation, but I thought I would post it here hoping that I can learn something from this. The final query (the actual SELECT) in the code above is:

SELECT O, OtoD || D as Path, D, ShortestDistance FROM ShortestPath INNER JOIN OD ON ShortestPath.O = OD.origin AND ShortestPath.D = OD.destination ORDER BY ShortestDistance Limit 1;

I realized that the first join condition is not needed because all the paths start from the specified origin anyways. So, I got rid of that, and made my final select as below:

SELECT O, OtoD || D as Path, D, ShortestDistance FROM ShortestPath INNER JOIN OD ON ShortestPath.D = OD.destination ORDER BY ShortestDistance Limit 1;

Surprise, surprise! The query takes 3 times as long to complete after this change. I am at a loss to find any reason why removing a useless, redundant join condition would cause a query to take longer, and that too significantly longer. I increased the limit in the recursive part fo 500000 for the tests, and instead of taking about 9 seconds to run, the query takes 25 seconds to run without the useless join condition. I ran the tests a dozen tests to make sure I was not seeing the effect of some system lag on my computer or something like that.

If somebody can provide me a plausible reason why asking SQLite to do extra, useless stuff enables it to accomplish a task 3 times faster, I am all ears! Thank you.

(3) By Keith Medcalf (kmedcalf) on 2021-05-01 23:34:35 in reply to 2 [source]

EXPLAIN QUERY PLAN (.eqp on) and EXPLAIN (.eqp full) may shed light.

(4) By Balaji Ramanathan (balaji) on 2021-05-02 20:39:54 in reply to 1 [link] [source]

So, is there no clean way to find the shortest path more reliably than by setting a high enough limit in the recursive query and hoping for the best? I found a 12-step shortest path between two places using 500000 as the limit (takes about 10 seconds to run), and the previous best was a 10-step solution that about 1000 miles longer when I set the limit at 250000. I really don't want to just experiment with the limit when I know that the correct way to find the shortest path is to prune branches based on what I have already found. Thank you.

(5) By Keith Medcalf (kmedcalf) on 2021-05-03 02:30:01 in reply to 4 [link] [source]

That is correct. A recursive query is an exhaustive search and barring artificial restrictions (such as LIMIT) will only exit once the recursion is exhausted. Unlike programmatic recursion, there is no way to "prune" branches based on what you have already found.

In other words, if you take, for example, map routing software searching for a route from point A to point B. The correct approach is not to LIMIT the number of steps in the recursion but to prohibit further searching of branches that are "too long", where you compute "too long" before asking the question. It may be, for example, that your search defines "too long" as a path distance more than X times the "as the crow flies" distance between point A and point B. If this does not generate a useful answer, then increase X. Lather rinse and repeat until an acceptable solution is found (or the search has taken so long in elapsed time that you can declare that you cannot get there from here).

(6) By anonymous on 2021-05-03 13:41:41 in reply to 1 [link] [source]

Perhaps this helps? https://github.com/abetlen/sqlite3-bfsvtab-ext

(7) By Balaji Ramanathan (balaji) on 2021-05-03 13:43:11 in reply to 5 [link] [source]

OK, that is disappointing, but at least, it looks like I am not missing anything. Thank you, Keith.

(8) By Balaji Ramanathan (balaji) on 2021-05-04 13:07:24 in reply to 6 [link] [source]

Thanks, that looks very interesting. I will take a look.

(9) By J-L Hainaut (JLHainaut) on 2021-05-05 15:45:41 in reply to 8 [link] [source]

You could also be interested by this reference:

https://staff.info.unamur.be/dbm/Documents/Tutorials/SQLfast/SQLfast-Case31-Shortest-path.pdf

It doesn't use recursive CTE's but a simple loop that implements Dijkstra's algorithm (complexity O(NlogN)) working on an SQLite database.

(10) By Balaji Ramanathan (balaji) on 2021-05-07 04:04:20 in reply to 9 [link] [source]

Thank you. Obviously, with looping, it is trivial to find the shortest path. You don't even need SQL to do it at all. I was hoping that there was some way to do something similar to dyjkstra's algorithm using a recursive query in SQL. Without the ability to do branch pruning, recursive CTE's are a lot less useful than I had thought they would be.

(11) By Keith Medcalf (kmedcalf) on 2021-05-07 04:56:36 in reply to 10 [link] [source]

-- see https://learnsql.com/blog/get-to-know-the-power-of-sql-recursive-queries/

create table edges
(
    fromNode    integer not null,
    toNode      integer not null,
    distance    real    not null default 1,
    speed       real    not null default 1,
    primary key (fromNode, toNode)
) without rowid;

insert into edges(fromNode, toNode, distance, speed) values
    (1, 2, 100,  40),
    (1, 3,  70,  50),
    (2, 3,  20,  40),
    (3, 2,  20,  40),
    (3, 4,  40,  50),
    (2, 5,  30,  50),
    (4, 5,  20,  40),
    (5, 6,  20,  20),
    (4, 6,  80, 100);

.parameter init
.parameter set :start 1
.parameter set :end 6
.parameter set :maxHops 10
.parameter set :maxCost 10
.parameter set :maxDistance 250

with paths (startAt, endAt, visited, hops, pathDistance, pathCost)
  as (
         select :start                      as startAt,
                :start                      as endAt,
                '/' || :start || '/'        as visited,
                0                           as hops,
                0                           as pathDistance,
                0                           as pathCost
      union all
         select startAt                     as startAt,
                toNode                      as endAt,
                visited || toNode || '/'    as visited,
                hops + 1                    as hops,
                pathDistance + distance     as pathDistance,
                pathCost + distance / speed as pathCost
           from paths, edges
          where fromNode == endAt
            and instr(visited, '/' || toNode || '/') == 0
            and instr(visited, '/' || :end || '/') == 0
            and hops < :maxHops
            and pathCost < :maxCost
            and pathDistance < :maxDistance
       order by pathCost
    )
select startAt, endAt, visited, hops, pathDistance, pathCost
  from paths
 where startAt == :start
   and endAt == :end
;

The ORDER BY in the recursive part selects which paths to open first.

(12) By J-L Hainaut (JLHainaut) on 2021-05-07 15:11:47 in reply to 10 [link] [source]

You are right, finding the shortest path through the brute force method (depth-first graph traversal while avoiding circuits) is easy to express with a CTE but implementing Dijkstra's algorithm is far less trivial. The main obstacle is that the current resultset of a CTE (the set of rows already computed) is "append only": one can augment it through the recursive member of the CTE but there is no way to delete or update its rows.

Personally, I'm (almost) sure that it is possible to design a CTE that simulates 'delete' and 'update' operations, for example by adding a 'version' attribute to its rows, and by selecting their last version, just like in a historical database. But I guess that the expression would be particularly complex, not to mention poorer performance. Sometimes, a carefully crafted procedural solution can be clearer and faster that its equivalent declarative expression !

As to your observation "You don't even need SQL to do it at all.", you also are right. Anyway, it's even possible to solve the shortest path problem through a Turing machine ;).

(13) By Balaji Ramanathan (balaji) on 2021-05-11 23:20:59 in reply to 12 [link] [source]

Actually, I am not looking to update the records already created or delete any of them, I am just looking for a way to control what gets added to the resultset based on the current contents of the resultset, but currently, it looks like even that is not possible with a recursive CTE. You can only control what gets added using global criteria that have to be decided upon a priori. Somewhat disappointing, but I learned something important about recursive CTE's that was not obvious in the documentation. Thank you to everyone on this forum who contributed to this learning.

(14) By Al_ (Al_Default) on 2023-01-08 08:45:51 in reply to 12 [link] [source]

For later users finding this post: implementing the Dijkstra algorithm using a TEMP TABLE and a recursive TEMP TRIGGER (and not a recursive WITH common table expression) is relatively straight forward. TRIGGERs allow updates to the table, as opposed to WITH recursive SELECT queries.

(15.1) By Aask (AAsk1902) on 2023-01-08 11:39:41 edited from 15.0 in reply to 11 [link] [source]

(16) By muxator on 2023-07-25 06:55:13 in reply to 14 [link] [source]

Hi, @Al_, do you know where one can find an example of this approach? Thanks!

(17) By J-L Hainaut (JLHainaut) on 2023-07-25 10:32:55 in reply to 16 [link] [source]

You will find some practical applications of recursive triggers in reference http://bit.ly/3Wm8IZT, Chapter 19, Section 19.7. The examples are written for SQLite.