Function takes a long time

Question

im currently working on trying to get the the number of unique paths from node 1 .. N of maximum length for a weighted directed acyclic graph, i have worked out getting the max length but i am stuck on getting the NUMBER of paths of that given max length...

Data is inputted like this:

91 120  # Number of nodes, number of edges

1 2 34

1 3 15

2 4 10

.... As Node 1-> Node 2 with a weight of 34,

I input my data using a diction so my dict looks like:
_distance = {}
_distance = {1: [(2, 34), (3, 15)], 2: [(4, 10)], 3: [(4, 17)], 4: [(5, 36), (6, 22)], 5: [(7, 8)],...ect

I have worked out how to achieve the longest length of the paths using this:

first i make a list of vertices

class Vertice:
    def __init__(self,name,weight=0,visted=False):
        self._n = name
        self._w = weight
        self._visited = visted
        self.pathTo

for i in range(numberOfNodes): # List of vertices (0-n-1)
  _V = Vertice(i) 
  _nodes.append(_V)

next i iterate through my dictionary setting each node to the maximum weight it can be

        for vert, neighbors in _distance.iteritems():
        _vert = _nodes[vert-1] # Current vertice array starts at 0, so n-1


        for x,y in neighbors:  # neighbores,y = weight of neighbors
            _v = _nodes[x-1]   # Node #1 will be will be array[0]

            if _v._visited == True:
                if _v._w > _vert._w+y:
                    _v._w = _v._w
                else:
                    _v._w = y + _vert._w

            else:

                _v._w = y + _vert._w
                _v._visited = True

with this done, the last node will have a weight of the maximum so i can just call

max = _nodes[-1]._w

to get the max weight. This seems to perform fast and has no trouble finding the max length path even when performed on the bigger data set, i then take my max value and run it into this function:

#  Start from first node in dictionary, distances is our dict{}
#  Target is the last node in the list of nodes, or the total number of nodes.
numLongestPaths(currentLocation=1,target=_numNodes,distances=_distance,maxlength=max)

def numLongestPaths(currentLocation,maxlength, target, sum=0, distances={}):


    _count = 0

    if currentLocation == target:
        if sum == maxlength:
                _count += 1

    else:
        for vert, weight in distances[currentLocation]:
            newSum = sum + weight
            currentLocation = vert
            _count += numLongestPaths(currentLocation,maxlength,target,newSum,distances)

    return _count

I simply check once we have hit the end node if our current sum is the max, if it is, add one to our count, if not pass.

This works instantly for the inputs such as 8 nodes and longest path is 20, finding 3 paths, and for inputs such as 100 nodes, longest length of 149 and only 1 unique path of that length, but when i try to do a data set with 91 nodes such as longest path 1338 and number of unique paths are 32, the function takes extremely LONG, it works but is very slow.

Can someone give me some tips on what is wrong with my function to cause it to take so long finding the # of paths length X from 1..N? i'm assuming its getting an exponential run time but i'm unsure how to fix it

Thank you for your help!

EDIT: Okay i was overthinking this and going about this the wrong way, i restructured my approach and my code is now as follows:

# BEGIN SEARCH.
    for vert, neighbors in _distance.iteritems():
        _vert = _nodes[vert-1] # Current vertice array starts at 0, so n-1


        for x,y in neighbors:  # neighbores

            _v = _nodes[x-1]   # Node #1 will be will be array[0]

            if _v._visited == True:
                if _v._w > _vert._w+y:
                    _v._w = _v._w
                elif _v._w == _vert._w+y:
                        _v.pathsTo += _vert.pathsTo
                else:
                    _v.pathsTo = _vert.pathsTo
                    _v._w = y + _vert._w

            else:

                _v._w = y + _vert._w
                _v.pathsTo = max(_vert.pathsTo, _v.pathsTo + 1)
                _v._visited = True

i added a pathsTo variable to my Vertice class, and that will hold the number of unique paths of MAX length

Posting your question in https://cs.stackexchange.com/ will give more interesting answers. — Barney, Jan 26 '18 at 02:41
Cross-posted: https://stackoverflow.com/q/48454870/781723, https://cs.stackexchange.com/q/87343/755. Please [do not post the same question on multiple sites](https://meta.stackexchange.com/q/64068). Each community should have an honest shot at answering without anybody's time being wasted. — D.W., Jan 26 '18 at 21:49
@Barney, I have a request. In the future, if you're going to suggest another site, can you please let the poster know not to cross-post? You can suggest they delete the question here before posting elsewhere, and remind them that they need to tailor the question to the specific site. This will provide a better experience for all. Also, I see you left your comment after this question already got one answer here. I don't think you should suggest another site after it has already been answered here. (continued) — D.W., Jan 26 '18 at 21:51
Finally, please don't suggest another site unless you know its scope well. Coding questions are off-topic on CS.SE, and questions asking about specific code snippets are off-topic there. The first thing we ask people to do is rewrite their question to describe their algorithm (e.g., via concise pseudocode) instead of showing code. I don't see that you have an account on CS.SE. It might be better to avoid suggesting sites that you're not active on, as it's less likely you'll know what kinds of questions will be well-received there if you're not active there yourself. Thank you for listening! — D.W., Jan 26 '18 at 21:52
Sorry about that cross-post , i dont know how to remove it as i posted as a guest on cs.exchange — Liverlips, Jan 26 '18 at 23:42

user2357112 · Accepted Answer · 2018-01-26T02:42:03.567

3

Your numLongestPaths is slow because you're recursively trying every possible path, and there can be exponentially many of those. Find a way to avoid computing numLongestPaths for any node more than once.

Also, your original _w computation is broken, because when it computes a node's _w value, it does nothing to ensure the other _w values it's relying on have themselves been computed. You will need to avoid using uninitialized values; a topological sort may be useful, although it sounds like the vertex labels may have already been assigned in topological sort order.

edited Jan 26 '18 at 02:42

answered Jan 26 '18 at 02:32

user2357112

260,549
28
431
505

thank you for the reply, what if say: Node 7 points to Node 8 and Node 9 , each carrying a weight of 1, and then Node 8 and Node 9 point to the same Node 10 each with the same weight, if i stop searching after node 7->8->10 path, wouldn't i miss out on 7->9->10 path that also holds the same weight? so i would need to compute a node more than once? – Liverlips Jan 26 '18 at 03:10
1

@Liverlips: Keep thinking about it. You'll need to learn the skills to figure this stuff out on your own; I don't want to spell out everything for you. – user2357112 Jan 26 '18 at 03:15
hey @user2357112 I thought about this all day and yesterday and read some more approaches to this, i updated my post with my new code that works great it seems, but is my dictionary not already in toplogical order? since it is working with all my test cases now – Liverlips Jan 26 '18 at 23:38

score 0 · Answer 2 · answered Jan 26 '18 at 02:49

In addition to @user2357112's answer, here are two additional recommendations

Language

If you what this code to be as efficient as possible, I recommend using C. Python is a great scripting language, but really slow compared to compiled alternatives

Data-structure

Nodes are named in an ordered fashion, you can thus optimize a lot your code by using a list instead of a dictionary. i.e.

_distance = [[] for i in range(_length)]

Function takes a long time

2 Answers2

Language

Data-structure