0

Problem

I have a list of line segments:

exampleLineSegments = [(1,2),(2,3),(3,4),(4,5),(5,6),(4,7),(8,7)]

These segments include the indices of the corresponding point in a separate array.

From this sublist, one can see that there is a branching point (4). So three different branches are emerging from this branching point. (In other, more specific problems, there might be / are multiple branching points for n branches.)

Target

My target is to get a dictionary including information about the existing branches, so e.g.:

result = { branch_1: [1,2,3,4],
           branch_2: [4,5,6],
           branch_3: [4,7,8]}

Current state of work/problems

Currently, I am identifying the branch points first by setting up a dictionary for each point and checking for each entry if there are more than 2 neighbor points found. This means that there is a branching point.

Afterwards I am crawling through all points emerging from these branch points, checking for successors etc.

In these functions, there are a some for loops and generally an intensive "crawling". This is not the cleanest solution and if the number of points increasing, the performance is not so good either.

Question

What is the best / fastest / most performant way to achieve the target in this case?

BennyS
  • 35
  • 1
  • 7

1 Answers1

0

I think you can achieve it by following steps:

  1. use a neighbors dict to store the graph
  2. find all branch points, which neighbors count > 2
  3. start from every branch point, and use dfs to find all the paths
from collections import defaultdict
def find_branch_paths(exampleLineSegments):
    # use dict to store the graph
    neighbors = defaultdict(list)
    for p1, p2 in exampleLineSegments:
        neighbors[p1].append(p2)
        neighbors[p2].append(p1)

    # find all branch points
    branch_points = [k for k, v in neighbors.items() if len(v) > 2]

    res = []

    def dfs(cur, prev, path):
        # reach the leaf
        if len(neighbors[cur]) == 1:
            res.append(path)
            return
        for neighbor in neighbors[cur]:
            if neighbor != prev:
                dfs(neighbor, cur, path + [neighbor])

    # start from all the branch points
    for branch_point in branch_points:
        dfs(branch_point, None, [branch_point])

    return res

update an iteration version, for big data, which may cause a recursion depth problem:

def find_branch_paths(exampleLineSegments):
    # use dict to store the graph
    neighbors = defaultdict(list)
    for p1, p2 in exampleLineSegments:
        neighbors[p1].append(p2)
        neighbors[p2].append(p1)

    # find all branch points
    branch_points = [k for k, v in neighbors.items() if len(v) > 2]

    res = []
    # iteration way to dfs
    stack = [(bp, None, [bp]) for bp in branch_points]
    while stack:
        cur, prev, path = stack.pop()

        if len(neighbors[cur]) == 1 or (prev and cur in branch_points):
            res.append(path)
            continue

        for neighbor in neighbors[cur]:
            if neighbor != prev:
                stack.append((neighbor, cur, path + [neighbor]))

    return res

test and output:

print(find_branch_paths([(1, 2), (2, 3), (3, 4), (4, 5), (5, 6), (4, 7), (8, 7)]))
# output: 
# [[4, 3, 2, 1], [4, 5, 6], [4, 7, 8]]

Hope that helps you, and comment if you have further questions. : )

UPDATE: if there are many branch points, the path will grow exponentially. So if you only want distinct segments, you can end the path when encounter another branch point.

change this line
if len(neighbors[cur]) == 1:
to
if len(neighbors[cur]) == 1 or (prev and cur in branch_points):

recnac
  • 3,744
  • 6
  • 24
  • 46
  • hi @recnac , first: Thank you very very much. This code looks so clean also works fine (and is really fast!). Unfortunately, i was moving to my largest problem and encountered a recursion depth problem (never had one before). Is there a good workaround to avoid this error? File "", line 32, in dfs if len(neighbors[cur]) == 1 or (prev and cur in branch_points): RecursionError: maximum recursion depth exceeded in comparison I googled and found something like import sys sys.setrecursionlimit(3000) But is there a "smarter" solution? – BennyS Apr 09 '19 at 13:31
  • how large of your data? If path length is longer than maximum recursion depth, you can use `iteration` version instead of `recursion` version. Or implement your own `stack` instead of using system function stack. – recnac Apr 09 '19 at 13:43
  • if you data is large, you only can record distinct path like I said above. – recnac Apr 09 '19 at 13:43
  • I will write a `iteration` version for you. – recnac Apr 09 '19 at 13:45
  • try the `iteration` version : ) @BennyS – recnac Apr 09 '19 at 13:54
  • That is so awesome! It works, also without the "sys solution". Thank you very very much recnac! I will try it more extensive next week :) – BennyS Apr 09 '19 at 14:34