Breadth-first search

(1) By DomX (Trafalgar) on 2020-08-21 21:57:43 [link] [source]

Dear guys!

I'm having a bit hard time today. I'm unable to build a breadth-first search in SQL like I need it. I tried to vary the versions at the paragraph 3.3 and 3.4 of https://sqlite.org/lang_with.html , but it didn't work as I expected it.

This is my first breadth-first search at all, so it was a bit difficult to get the theory to code anyway. I started with my "native" language Perl and got it running very precise, even the input keeps more information than needed. But I can't translate it to (working) SQL.

What I wrote using Perl:

#!/usr/bin/perl



use strict;                                                     

use warnings;                                                    

my($_sub_breadth_first_search);                                  

my @list                        = (                             

        { id            => "00001", neighbour   => "00002" },   # Start
        { id            => "00001", neighbour   => "00003" },   

        { id            => "00002", neighbour   => "00001" },   

        { id            => "00002", neighbour   => "05201" },   

        { id            => "00005", neighbour   => "00002" },   

        { id            => "00002", neighbour   => "00005" },   

        { id            => "05201", neighbour   => "00002" },   

        { id            => "00003", neighbour   => "00001" },   

        { id            => "00003", neighbour   => "00815" },   

        { id            => "00815", neighbour   => "00003" },   

        { id            => "00815", neighbour   => "00004" },   

        { id            => "00004", neighbour   => "00005" },   

        { id            => "00004", neighbour   => "00815" },   

        { id            => "00005", neighbour   => "00004" },   # dest
        );      # &access_DB("SELECT...");                       

$_sub_breadth_first_search      = sub {                         

        my $origin              = shift;                        

        my $destionation        = shift;                        

        my @queue               = ();                           

        my @done                = ();                            

    push(@queue,    {                                        
            id              => $origin,                      
            jumps           => 0,                            
            visited         => 1,                            
            former          => 0,                            
            });                                              

    while ( @queue ) {                                       
            my $u   = shift(@queue);                         

            foreach my $knot ( grep { $_->{id} eq $u->{id} } @list ) {
                    my $knot        = {     # Needed, to make a proper copy and not changing values of @list
                            id      => $knot->{neighbour},   
                            jumps   => $u->{jumps} + 1,      
                            former  => $u->{id},             
                            visited => 1,                    
                            };                               

                    if ( $knot->{id} eq $destionation ) {    
                            return($knot->{jumps});          
                            }                                
                    elsif ( !( grep { $knot->{id} == $_->{id} && $_->{visited} } @done ) ) {
                            push(@queue, $knot);             
                            }                                
                    }                                        

            push(@done, $u);                                 
            }                                                

    return(0); # destination not found
    };

print "Jumps: " . &{$_sub_breadth_first_search}("00001", "00005") . "n"; # returns needed jumps (distance) exit(0);

What I came up with SQL:

CREATE TABLE systems ( id integer primary key, name text not null ); CREATE TABLE neighbours ( system_id integer, neighbour_id integer, primary key ( system_id, neighbour_id ), foreign key ( system_id ) references systems(id), foreign key ( neighbour_id ) references systems(id) ); CREATE TABLE route_jumps ( origin integer, destination integer, jumps integer, primary key ( origin, destination ), foreign key ( origin ) references systems(id), foreign key ( destination ) references systems(id) );



CREATE VIEW missing_jumps as                                    

select s1.id origin, s2.id destination                          

from systems s1                                                 

inner join systems s2                                           

on s1.id != s2.id                                               

left join route_jumps                                           

on origin == s1.id                                              

and destination == s2.id                                        

where route_jumps.jumps is null                                 

and s1.id <= s2.id                                              

union                                                           

select s2.id origin, s1.id destination                          

from systems s1                                                 

inner join systems s2                                           

on s1.id != s2.id                                               

left join route_jumps                                           

on origin == s1.id                                              

and destination == s2.id                                        

where route_jumps.jumps is null                                 

and s1.id > s2.id;

WITH RECURSIVE queue ( origin, jumps, destination ) AS ( VALUES(1,0,5) -- WORK just for testing, later this will be a select on missing_jumps: SELECT origin, 0 jumps, destination FROM missing_jumps UNION SELECT n.neighbour_id, (queue.jumps + 1) jumps, queue.destination destination FROM neighbours n INNER JOIN queue ON n.system_id == queue.origin AND n.system_id != queue.destination ORDER BY jumps ) SELECT origin, destination, jumps FROM queue; -- max(jumps) jumps FROM queue GROUP BY origin, destination;

(2) By Larry Brasfield (LarryBrasfield) on 2020-08-21 22:58:25 in reply to 1 [source]

I have to wonder how you were set upon this quest. The "breadth-first" concept refers to an algorithm for visiting tree nodes. It is explicitly procedural. The SQL way of expressing queries is not procedural at all. It allows one to specify the properties that the result set will have rather than how or with what computational steps the query engine will produce that result set.

This mismatch suggests an approach that will at least produce an ordering of the result set that would match, first-to-last, the time ordering that could result from a breadth-first tree traversal procedure. Your query needs to keep track of the distance (in graph edges) from the root node for each tree node in the results. Then, by using an ORDER BY clause, the result ordering can be made to match the time ordering that the procedure could have produced. (I say "could have" because edges leading from a tree node can also be ordered, and "breadth-first" says nothing about that.)

(3.1) By Keith Medcalf (kmedcalf) on 2020-08-21 23:38:11 edited from 3.0 in reply to 1 [link] [source]

This is, in fact, breadth first.

That is because the recursion prefers making shorter paths longer rather than making longer paths longer -- all the paths of length 1 will be explored before all the paths of length 2 before all the paths of length 3 and so on and so forth.

In otherwords, solutions are sought "breadth first" (syntactic sugar for "searched by bro's and ho's at the same distance before aunt's and uncle's at the next level slithering down the tree").

If you changed the "ORDER BY jumps" to "ORDER BY jumps DESC" then the recursion would prefer making longer paths longer before making shorter paths longer -- what might be called "depth first" (syntactic sugar for "searched by aunt's and uncle's slithering down the tree before bro's and ho's at the same level").

If you leave out the "ORDER BY" entirely then the searching is by "looking order". That means that potential solutions are generated in the order in which you look for them. (Which should be noted appears to be recursing from bottom to top of the tree by breadth -- ie "bro's and ho's of the farthest aunt or uncle going up the tree path length by each).

(4) By DomX (Trafalgar) on 2020-08-22 07:28:09 in reply to 3.1 [link] [source]

I'm sorry, I'm not sure, what you want to tell me by explaining this to me. Much of this sounded like what I read at the SQLite documentation (espesscially the part of ORDER BY (breadth) and ORDER BY DESC (depth).

My Problem is actually: The Perl code produces one return, which is one single number for one origin+destination combination. But the SQL code generates numberless results of the same combination. I think what I'm missing is: How to get the procedure stopping (emptying the queue) when I reached the destination and counted the distance (jumps)?

(5) By Keith Medcalf (kmedcalf) on 2020-08-22 07:49:52 in reply to 4 [link] [source]

The recursive query will continue to run until it has enumerated all possible paths, and then you select the one you want. If you want the recursion to stop before it is finished then you have to not add more rows (stop recursing) once you have found the solution you are looking for.

That is, completion of recursion is the state at which there are no further entries in the queue waiting to be recursed.

(6) By DomX (Trafalgar) on 2020-08-22 08:03:25 in reply to 5 [link] [source]

Yeah, so far I understood. But I have two issues: 1st.: the clause AND n.system_id != queue.destination isn't stopping any further addings and 2nd: Usually the missing_jumps knew ALL directions, also backwards (I solved this through removing the double once:

1 → 5 is the same as 5 → 1

This means:

neighbours table (usually): 2 knows it's connected to 1 and 5 5 knows it's connected to 6 and 2 1 knows it's connected to 2 (dead end) 6 knows it's connected to 5 and 7 and 11 and 21. 7 knows it's connected to 6 and ... 11 knows it's connected to 6 and ... 21 knows it's connected to 6 and ... and so on...)

To not to redo former systems I use the @done array in Perl. If we already visited the system we won't progress it again. But I have no Idea, how I can make this happen in SQL nor how I empty the queue when I got all my information needed. (Analog to the if ( $knot->{id} eq $destionation ) { return($knot->{jumps}); } in Perl.)

(7) By Keith Medcalf (kmedcalf) on 2020-08-22 18:16:35 in reply to 6 [link] [source]

First of all, you are building PATHS not QUEUES. So the recursion will complete when there are no more entries for which recursion is possible. That is to say that your particular recursion will complete when all "beginnings" that can reach the any ending have been enumerated.

UNION means do not add entries that are already added, and UNION ALL means add entries even if they are duplicates.

There is no way to "empty the queue". It becomes empty when there is nothing more added and all entries have been processed. It is not a queue, it is an enumeration of solution steps.

In other words the interim recursive table does not "grow until a solution is found". It contains an exhaustive enumeration of every "path" meeting your testing requirement.

Your problem is that your data is not a tree but contains loops. Get rid of the loops and all will work just fine.

The problem is that the path from 1 -> 5 can go 1 -> 2 -> 5 with 2 jumps, 1 -> 2 -> 1 -> 2 -> 5 with 4 jumps, and so on and so forth --- so the enumeration of paths will never conclude because you can "loop around forever" without making any progress and you have specified no limit on the search depth.

create table pc
(
    parent integer not null,
    child integer not null,
    primary key (parent, child),
    unique (child, parent)
) WITHOUT ROWID;

insert into pc values (1,2),
                      (1,3),
                      (2,1),
                      (2,5201),
                      (5,2),
                      (2, 5),
                      (5201,2),
                      (3,1),
                      (3,15),
                      (15,3),
                      (15,4),
                      (4,5),
                      (4,15),
                      (5,4);

WITH paths (origin, jumps, destination)
  AS (
      VALUES (1, 0, 1)
     UNION
      SELECT paths.origin, (jumps+1), pc.child
        FROM pc, paths
       WHERE pc.parent == paths.destination
       LIMIT 100
     )
select * from paths;

Note that there are an infinite number of ways to get from 1 to 5 and that the minimum path length is 2. But since you can loop around and about over and over again, without something to terminate the recursion (such as a limit) it will carry of until the heat death of the multiverse.

(8) By DomX (Trafalgar) on 2020-08-22 22:04:48 in reply to 7 [link] [source]

Woah! You did it! I never thought of using the origin twice for the first value!! *__* I have to figure out, which limit is needed. And maybe I find a way to reduce the reverse loops, but the min() and group by does the basic job already! =)

Thanks a lot, Keith!!

(9) By Keith Medcalf (kmedcalf) on 2020-08-22 22:41:12 in reply to 8 [link] [source]

Something like this perhaps that does not allow a node to be revisited ... so you do not have to worry about there being loops in your neighbour table.

create table pc
(
    parent integer not null,
    child integer not null,
    primary key (parent, child),
    unique (child, parent)
) WITHOUT ROWID;

insert into pc values (1,2),
                      (1,3),
                      (2,1),
                      (2,5201),
                      (5,2),
                      (2, 5),
                      (5201,2),
                      (3,1),
                      (3,15),
                      (15,3),
                      (15,4),
                      (4,5),
                      (4,15),
                      (5,4);

WITH paths (origin, jumps, destination, path)
  AS (
      VALUES (1, 0, 1, '/1/')
     UNION
      SELECT paths.origin, (jumps+1), pc.child, path || pc.child || '/'
        FROM pc, paths
       WHERE pc.parent == paths.destination
         AND instr(path, '/' || pc.child || '/') == 0
     )
select *
  from paths
 where jumps > 0;

(10) By DomX (Trafalgar) on 2020-08-22 22:52:50 in reply to 9 [link] [source]

Man... I not even knew about instr(). XD
You're simply... A mastermind! I needed some time to find this function in the sqlite.org documentation...
Thank you so much! =)

(11) By Keith Medcalf (kmedcalf) on 2020-08-22 23:11:39 in reply to 9 [link] [source]

To enumerate all paths from anywhere to anywhere you could use this, for example:

WITH paths (origin, jumps, destination, path)
  AS (
      SELECT start, 0, start, '/' || start || '/'
        FROM (
              SELECT parent AS start
                FROM pc
             UNION
              SELECT child AS start
                FROM pc
             )
     UNION
      SELECT paths.origin, (jumps+1), pc.child, path || pc.child || '/'
        FROM pc, paths
       WHERE pc.parent == paths.destination
         AND instr(path, '/' || pc.child || '/') == 0
     )
select *
  from paths
 where jumps > 0;

(12) By DomX (Trafalgar) on 2020-08-22 23:23:49 in reply to 11 [link] [source]

This I already figured out within my infrastructure. ;-)
But I don't want to do 5 → 1 if I already have calculated 1 → 5, because it's the same distance and I can JOIN this by simply making an (1 == start AND 5 == dest) OR (5 == start AND 1 == dest).

More important, doubling the length of the full list calculating to get the way forth and back would make a huge increase of the list. (We're talking of > 32 mio rows one way. Doubling this would be a mess.)
And if I really want to have added both, I could do a insert select with reversing origin and destination columns. May be faster than calculate again 32M rows again. ;-)