3

Given a tuple of lists, I need to find all unique path from that:

Example I/P: [(1,2),(2,3),(3,4),(9,11),(4,5),(5,6),(6,7),(3,9)]
O/P: [[(1,2),(2,3),(3,4),(4,5),(5,6),(6,7)],[(1,2),(2,3),(3,9),(9,11)]]

Two tuples can connect if the second element of the tuple matches with the first element of the other tuple i.e: One tuple is (_,a) and other tuple is like (a,_).

What is the most efficient implementation for this ? I need to find the best data structure suited for it. Any suggestions ? The number of tuples in which I will execute the algorithm will be like more than 400,000.

Sibi
  • 47,472
  • 16
  • 95
  • 163
  • I think your output has error. Last two paths can be merged into one. I submitted an edit, but correct me if I'm wrong – Anton Guryanov Feb 05 '13 at 11:12
  • 1
    I see a list of tuples there, not a tuple of lists. – Ingo Feb 05 '13 at 11:15
  • @AntonGuryanov Oh yes, it was my mistake. Updated the question. – Sibi Feb 05 '13 at 11:16
  • @Ingo Sorry, Corrected that. – Sibi Feb 05 '13 at 11:18
  • 1
    Can the same edge be used several times? E.g. if there was also an edge `(6,3)` in the input, would `..,(3,4),(4,5),(5,6),(6,3),(3,4),..` be okay? – Daniel Fischer Feb 05 '13 at 12:18
  • @DanielFischer Yes, that is ok as long as the generated path is unique. But it should not become a cycle, If it is becoming a cycle, it's better to end before it becomes cycle. – Sibi Feb 05 '13 at 15:36
  • What sort of output do you expect if there are possible branches in your input? Do you only want the paths with maximum length? Do you always have a single starting point? I.e. for input [(1,2), (2,3), (1,4), (4,5), (5,6), (7,4)], should your output contain [(7,4), (4,5), (5,6)] ? – Frank Schmitt Feb 05 '13 at 20:07
  • @FrankSchmitt I don't have a single starting point. I expect unique branches with maximum length. If I get a path `[1,2,3,4]` then I don't want a path `[2,3]` because it's just a sub-path of the former case. – Sibi Feb 06 '13 at 02:27

2 Answers2

3
{-# LANGUAGE NoMonomorphismRestriction #-}
import Data.List (permutations, nub)

path :: Eq a => [(a, a)] -> [(a, a)]
path [] = []
path [x] = [x]
path (u@(_, a):v@(b, _):xs) = if a == b then u:path (v:xs) else [u]

allPaths = nub . map path . permutations

(you can optimize chain generation but I think this problem has exponential time complexity)

EDITED

In general, you must to define more preciselly what paths you want to return.

Ignoring cycle invariant ([(1,2),(2,3),(3,1)] == [(2,3),(3,1),(1,3)]) you can generate all paths (without using permutations)

{-# LANGUAGE NoMonomorphismRestriction #-}
import Data.List (permutations, nub, sortBy, isInfixOf)

data Tree a = Node a [Tree a] deriving Show

treeFromList :: Eq a => a -> [(a, a)] -> Tree a
treeFromList a [] = Node a []
treeFromList a xs = Node a $ map subTree $ filter ((a==).fst) xs
  where subTree v@(_, b) = treeFromList b $ filter (v/=) xs

treesFromList :: Eq a => [(a, a)] -> [Tree a]
treesFromList xs = map (flip treeFromList xs) $ nub $ map fst xs ++ map snd xs

treeToList :: Tree a -> [[a]]
treeToList (Node a []) = [[a]]
treeToList (Node a xs) = [a:ws | ws <- concatMap treeToList xs]

treesToList :: [Tree a] -> [[a]]
treesToList = concatMap treeToList

uniqTrees :: Eq a => [[a]] -> [[a]]
uniqTrees = f . reverse . sortBy ((.length).compare.length)
  where f [] = []
        f (x:xs) = x: filter (not.flip isInfixOf x) (f xs)

allPaths = uniqTrees . treesToList . treesFromList

then

*Main> allPaths [(1, 2), (1, 3), (2, 3), (2, 4), (3, 4), (4, 1)]
[[2,4,1,2,3,4],[2,3,4,1,2,4],[1,3,4,1,2,4],[1,3,4,1,2,3],[1,2,4,1,3,4],[1,2,3,4,1,3]]

uniqTrees has poor efficiency and, in general, you can do many optimizations.

If you want to avoid cycle invariant, you can normalize a cycle selecting minimum base10 representation, in previous example ([(1,2),(2,3),(3,1)] == [(2,3),(3,1),(1,3)]) 1231 < 2313 then

normalize [(2,3),(3,1),(1,3)] == [(1,2),(2,3),(3,1)]

you can normalize a path rotating it n-times and taking "head . sortBy toBase10 . rotations".

josejuan
  • 9,338
  • 24
  • 31
  • The solution works fine. :-) One thing I don't want is the repeated sub-parts i.e If one path is [(1,2),(2,3),(3,4)], then I don't want (2,3),(3,4) to be a new path since it is just only a sub-part of the former path. – Sibi Feb 06 '13 at 05:11
  • @Sibi, these paths are different paths, but you can filter easily sorting (first longest) and filtering tails. – josejuan Feb 06 '13 at 08:00
1

I think your problem fits on the NP category since:

A Hamiltonian path, also called a Hamilton path, is a path between two vertices of a graph that visits each vertex exactly once.

In general, the problem of finding a Hamiltonian path is NP-complete (Garey and Johnson 1983, pp. 199-200), so the only known way to determine whether a given general graph has a Hamiltonian path is to undertake an exhaustive search (source)

You problem is even "harder" since you don't know before hand what will be the end node.

In terms of data structure you can try to simulate the hash table structure in Haskell, since this data type is commonly use in graph and you problem can be turn into a graph.

dreamcrash
  • 47,137
  • 25
  • 94
  • 117
  • Sibi problem isn't in NP, you can't verify a solution in P (i think exists, in general, exponential number paths for a given input set, then, you can't verify in P) – josejuan Feb 05 '13 at 12:39