3

Consider the following directed graph:

I'd like to be able to pick two nodes in this graph, and then find all of the possible paths between those two nodes. (Well, not all of the paths, just those paths not containing cycles.)

First, I model the graph in a view (as a transitive closure)

CREATE OR REPLACE VIEW closure AS 
WITH 
edges AS (
    -- for demonstration purposes, I just need to model the edges
    SELECT 'A' AS start_node, 'C' AS end_node FROM dual  UNION ALL
    SELECT 'A', 'D' FROM dual  UNION ALL
    SELECT 'B', 'C' FROM dual  UNION ALL
    SELECT 'B', 'D' FROM dual  UNION ALL
    SELECT 'C', 'E' FROM dual  UNION ALL
    SELECT 'D', 'F' FROM dual  UNION ALL
    SELECT 'E', 'A' FROM dual  UNION ALL
    SELECT 'E', 'F' FROM dual  UNION ALL
    SELECT 'E', 'G' FROM dual  UNION ALL
    SELECT 'F', 'B' FROM dual  UNION ALL
    SELECT 'F', 'G' FROM dual  
)
SELECT 
    CONNECT_BY_ROOT start_node                                          AS start_node,    
    end_node                                                            AS end_node,
    CONNECT_BY_ROOT start_node||sys_connect_by_path(end_node, ' -> ')   AS path,
    level + 1                                                           AS path_length
FROM 
    edges
CONNECT BY 
    NOCYCLE PRIOR end_node = start_node

-- this bit just adds the trivial paths to the closure
UNION ALL
SELECT DISTINCT start_node, start_node, start_node, 1 FROM edges
;

view CLOSURE created.

Now, suppose that I wanted to see all of the paths that start at 'E' and end at 'G'.

SELECT path, path_length 
FROM closure 
WHERE start_node='E' 
  AND end_node = 'G'
ORDER BY path_length ASC
;

PATH                            PATH_LENGTH
------------------------------  -----------
E -> G                                    2 
E -> F -> G                               3 
E -> A -> C -> E -> G                     5 
E -> A -> D -> F -> G                     5 
E -> F -> B -> C -> E -> G                6 
E -> A -> C -> E -> F -> G                6 
E -> A -> D -> F -> B -> C -> E -> G      8 

 7 rows selected 

I am surprised the the third, fifth, sixth, and seventh rows are returned. These four rows all contain a cycle (specifically, from 'E' to 'E').

Looking at the data in my view, it appears that the only cycles that are included are those that include the root node. (e.g., the cycle 'E' -> 'A' -> 'C' -> 'E') is only included in paths that begin with node 'E'... it is never included in paths that begin with any other node.)

I have tried to prevent these cycles from happening by adding a CONNECT_BY_ROOT start_node <> start_node in my "CONNECT BY" clause, but this is not allowed by Oracle.

How can I prevent these cycles from occurring in my data?

==== BEGIN EDIT ====

As @Egor Skriptunoff pointed out below, I am working with a table of edges, so maybe I should expect to see some vertices repeated, as long as they are not visited over the same edge. However, if this were the case, I should see the path 'E -> F -> B -> D -> F -> G' in the results above. This is a path that cycles through node 'F', once via edge 'E -> F' and once via 'D -> F'. However, this path is not there.

I presume this is because the NOCYCLE is preventing node 'F' from occurring twice. Is this really the case?

Oracle version: Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production

==== END EDIT ====

Paul Hanbury
  • 941
  • 1
  • 10
  • 24

2 Answers2

1

Your select clause is flawed - you have to apply CONNECT_BY_ROOT to the head of the edge instead of its tail

-- ...
CONNECT_BY_ROOT end_node                                  AS start_node,    
end_node                                                  AS end_node,
sys_connect_by_path(end_node, ' -> ')                     AS path,
level                                                     AS path_length
-- ...
Paul Hanbury
  • 941
  • 1
  • 10
  • 24
collapsar
  • 17,010
  • 4
  • 35
  • 61
  • it works fine on my system (ora 11g2) - only loop free paths are produced, namely EADFG, EFG and EG. I will provide the sample output later on when i'm at my desktop again. – collapsar Apr 10 '14 at 20:04
  • You are correct... I dismissed your answer too quickly because I saw that your `path`s would have been incorrect (but only slightly). Given your reassurance, I tried your solution, and it seems to work. I have edited your select clause (pending peer review) to give the appropriate `path`s. I will test this some more tomorrow, and if everything goes well, will accept this answer. – Paul Hanbury Apr 10 '14 at 21:39
  • There are a bunch of rows that are being duplicated... do you know why that is? – Paul Hanbury Apr 10 '14 at 21:49
  • yes, i just saw the initial self loop when i prepared the output. wrt duplicate rows, i do not know the reason for that. – collapsar Apr 10 '14 at 21:59
1

You are working with a table of edges, so it is repeated edges that are excluded, vertices may repeat.

Egor Skriptunoff
  • 23,359
  • 2
  • 34
  • 64
  • I thought that this makes a lot of sense. I added an edge between 'D' and 'E', expecting my closure view to generate a row for the path 'A -> D -> F -> B -> D -> E -> G' (having a cycle from 'D' and 'D', but not over any of the same edges), but there is no such row. In fact there is a row for 'A -> D -> F -> B', but no row that continues from there to 'D' again. (I'm going to update the question to reflect these changes.) – Paul Hanbury Apr 10 '14 at 19:51
  • Actually, I noticed that even without adding that extra edge, there is at least one path that would be included by your idea that don't exist in the closure. Namely, 'E -> F -> B -> D -> F -> G'. – Paul Hanbury Apr 10 '14 at 20:37