0

So, I do understand depth-first-search is not appropriate for this problem and something like UCS or Astar would be a lot better, just attempting to see if it is possible with a DFS approach.

I need to find a path within a cost budget, my approach using dfs is to keep the cost to get to the next node in the stack as they get pushed, and if going to the next node exceeds, it ignores does not push.

The problem I am facing is that when the budget is exceeded, I am struggling to find a way to set the nodes that have led to this path to be set back as unvisited so newer paths can consider them. I feel it has something to do with setting visited/unvisited properly and have tested a couple but in a bigger graph input, it always fails to find a path within constraint (Have confirmed that it exist using other methods)

def DFS(start, end, budget, adj_list, dist_list, cost_list):
# TODO Depth-First Search
print(f"DFS path from {start} to {end} with budget of {budget}...")

# Set all initial to not visited
visited = [False] * (len(adj_list) + 1)
parent = [None] * (len(adj_list) + 1)
stack = deque()

# Push starting node into the stack
stack.append((start, 0))
parent[start] = -1

path = deque()
pathFound = False

while (len(stack)):
    next_node_tup = stack.pop()
    next_node = int(next_node_tup[0])
    energy_used = next_node_tup[1]

    # Found target, proceed to print path
    if (next_node == end):
        pathFound = True    
        break

    # Set to visited if not yet
    # if (not visited[next_node]):
    #     visited[next_node] = True
    
    # Add all connecting nodes to top of stack
    edgeNodes = adj_list[str(next_node)]
    #print(f"There are {len(edgeNodes)} nodes connecting to {next_node}")

    for node in edgeNodes:
        node = int(node)
        #print(f"Node {node} connecting to node {next_node}")

        if (not visited[node]):
            # If taking this node exceeds budget, we dont want it
            minipath = str(next_node) + "," + str(node)
            print(f"Cost to go from {next_node} to {node} is {cost_list[minipath]}")
            energy_if_take = energy_used + cost_list[minipath]
            print(f"Energy used if path taken is {energy_if_take}")
            if (energy_if_take <= budget):
                parent[node] = next_node
                stack.append((node, energy_if_take))
                visited[node] = True
            else:
                print(f"Budget exceeded")
                # ! Need to set backtracked nodes to not visited
                # ! Problem is here!
                visited[node] = False
        
    print(stack)

if pathFound:
    curr = end
    path = []
    distance = 0
    energy = 0
    print("Printing path")
    while(curr != -1):
        path.append(curr)
        if (parent[curr] != -1):
            miniPath = str(parent[curr]) + "," + str(curr)
            distance += dist_list[miniPath]
            energy += cost_list[miniPath]
        curr = parent[curr]

    # Reverse and print path
    path.reverse()
    print(*path, sep = " -> ")
    print(f"Number of jumps is {len(path)}")
    print(f"Distance from {start} to {end} is {distance}")
    print(f"Cost from {start} to {end} is {energy}")
else:
    print("No path that satisfies budget")

Any hints are suggestions would be appreciated, been stuck here for quite a few days :(

  • Now you understand why DFS is not so appropriate for this problem. – Scott Hunter Feb 14 '22 at 20:50
  • Why don't you use recursion for your DFS? It will make things a lot easier. – trincot Feb 14 '22 at 21:00
  • @trincot I'll try that out, I generally write iterative as its easier to hand trace, plus my sample input has over 250K nodes/ 750K edges, I was abit afraid the recursive stack might be too large. – KerubinGema Yuen Feb 14 '22 at 21:09
  • Yes, with that size of a graph, recursion is not a viable option. – trincot Feb 14 '22 at 21:10
  • 1
    I find the constant conversion between int and str confusing. Why not stick to one of both? Also, can you add some driver code, which constructs a small sample graph, we can test with? – trincot Feb 14 '22 at 21:19
  • The input file given to us used a string as key, probably because the key format for cost and dist was "source,destination", I converted the single value key to an int and kept my structures consistent throughout, only converting back to string when retrieving the cost or distance – KerubinGema Yuen Feb 14 '22 at 21:27
  • Why did you convert to int? – trincot Feb 14 '22 at 21:35
  • Some driver code / an example failing input would be very helpful. It looks like the problem is that you're setting `visited[node]` to true when you've only put that node on the stack to be visited later. That works for recursive DFS, but not here. You'll need to set next_node as visited, push all the unvisited neighbors of `next_node` on the stack, and figure out a way to set next_node as 'unvisited' after its last neighbor has been popped off of the stack. In a normal m-ary tree, this is done by maintaining parent, left and right sibling pointers at each node. – kcsquared Feb 15 '22 at 00:12

1 Answers1

1

Here is a modified version of your code, with a test case, that addresses the issue(s) in your question.

Assumptions:

  • The graph is directed, with edges as per adj_list;
  • The graph may have cycles (hence the desire to track a prior encounter using visited);
  • We want to traverse the graph using DFS.

Key logic:

  • Append start node to stack with waiting_for_adj_list flag set to False
  • In the DFS loop, pop a node from stack and either (1) mark it as visited, re-append it to stack with waiting_for_adj_list flag set to True and append its children to stack (subject to detection of a cycle or a broken budget) or (2) reset visited status for the node and its children.
  • Exit early upon reaching end node at or under budget.

Note that I have taken some liberties in type usage (list instead of deque for stack, dict instead of list for parent and visited, maybe a couple of others) to avoid complexity and/or complications outside the focus of your question.

    def DFS(start, end, budget, adj_list, dist_list, cost_list):
        visited = {start: True}
        parent = {}
        path = []
        pathFound = False
        stack = [(start, 0, False)] #node, energy, waiting_for_adj_list
        while stack:
            next_node, energy_used, waiting_for_adj_list = stack.pop()
            next_node = int(next_node)
            if next_node == end:
                pathFound = True   
                print(f"Path found at cost {energy_used}: {path + [next_node]}")
                break
            if waiting_for_adj_list:
                # now that we're done with next_node and its children, mark children unvisited
                edgeNodes = adj_list[str(next_node)]
                for node in edgeNodes:
                    node = int(node)
                    if parent[node] == next_node:
                        visited[node] = False
                visited[next_node] = False
                print(f"done traversing {next_node}, visited {[node for node, v in visited.items() if v]}")
                path.pop()
                continue
            stack.append((next_node, energy_used, True)) # append to stack again with waiting_for_adj_list == True
            visited[next_node] = True
            path.append(next_node)
            edgeNodes = adj_list[str(next_node)]
            for node in edgeNodes:
                node = int(node)
                if node in visited and visited[node]:
                    # detected a cycle, don't follow it
                    print(f"Cycle detected: {path + [node]}")
                else:
                    minipath = str(next_node) + "," + str(node)
                    energy_if_take = energy_used + cost_list[minipath]
                    if (energy_if_take <= budget):
                        parent[node] = next_node
                        stack.append((node, energy_if_take, False))
                    else:
                        print(f"Budget {budget} exceeded at cost {energy_if_take} for path {path + [node]}")
                        # node is still available to be explore from other parents, but don't put on stack
            print(f"stack {stack}")
            print(f"path {path}")
            
        if pathFound:
            curr = end
            path = []
            distance = 0
            energy = 0
            print("Printing path")
            while(curr != -1):
                path.append(curr)
                if (curr in parent):
                    miniPath = str(parent[curr]) + "," + str(curr)
                    distance += dist_list[miniPath]
                    energy += cost_list[miniPath]
                curr = parent[curr] if curr in parent else -1

            # Reverse and print path
            path.reverse()
            print(*path, sep = " -> ")
            print(f"Number of jumps is {len(path)}")
            print(f"Distance from {start} to {end} is {distance}")
            print(f"Cost from {start} to {end} is {energy}")
        else:
            print("No path that satisfies budget")
            
    # Test case
    start, end = 0, 7
    budget = 200
    adj_list = {
        '0':{1, 2, 3},
        '1':{4}, '2':{4}, '3':{4},
        '4':{5, 6},
        '5':{7},
        '6':{7, 8},
        '7':{},
        '8':{4}
    }
    #4,6,8,4 is a cycle
    cost_list = {
        '0,1':10,
        '0,2':20,
        '0,3':30,
        '1,4':40,
        '2,4':400,
        '3,4':40,
        '4,5':50,
        '4,6':60,
        '5,7':100,
        '6,7':700,
        '6,8':8,
        '8,4':40
    }
    #0,3,4,6,8,4 has a cycle
    #0,3,4,6,7 short-circuits at 830 cost
    #0,3,4,5,7 short-circuits at 220 cost
    #0,2,4 short-circuits at 420 cost
    #0,1,4,6,8,4 is a cycle
    #0,1,4,5,7 costs 200
    dist_list = {
        '0,1':1,
        '0,2':1,
        '0,3':1,
        '1,4':1,
        '2,4':1,
        '3,4':1,
        '4,5':1,
        '4,6':1,
        '5,7':1,
        '6,7':1,
        '6,8':1,
        '8,4':1            
    }
    DFS(start, end, budget, adj_list, dist_list, cost_list)

This code:

  • Extends the tuple on the stack to have a third element waiting_for_adj_list, so when we pop a next_node we immediately append it again with this flag set to True, or if already True we do some clean-up for the children of next_node.
  • Uses visited purely to detect cycles, and clears it away for children as part of the aforementioned clean-up, so that if nodes downstream of next_node are also on other paths for nodes upstream of next_node, they can be traversed and the cost analysis particular to such paths can proceed unimpeded.
  • On detecting a cycle, takes no special action except to break the cycle.

Test case stdout is:

stack [(0, 0, True), (1, 10, False), (2, 20, False), (3, 30, False)]
path [0]
stack [(0, 0, True), (1, 10, False), (2, 20, False), (3, 30, True), (4, 70, False)]
path [0, 3]
stack [(0, 0, True), (1, 10, False), (2, 20, False), (3, 30, True), (4, 70, True), (5, 120, False), (6, 130, False)]
path [0, 3, 4]
Budget 200 exceeded at cost 830 for path [0, 3, 4, 6, 7]
stack [(0, 0, True), (1, 10, False), (2, 20, False), (3, 30, True), (4, 70, True), (5, 120, False), (6, 130, True), (8, 138, False)]
path [0, 3, 4, 6]
Cycle detected: [0, 3, 4, 6, 8, 4]
stack [(0, 0, True), (1, 10, False), (2, 20, False), (3, 30, True), (4, 70, True), (5, 120, False), (6, 130, True), (8, 138, True)]
path [0, 3, 4, 6, 8]
done traversing 8, visited [0, 3, 6]
done traversing 6, visited [0, 3]
Budget 200 exceeded at cost 220 for path [0, 3, 4, 5, 7]
stack [(0, 0, True), (1, 10, False), (2, 20, False), (3, 30, True), (4, 70, True), (5, 120, True)]
path [0, 3, 4, 5]
done traversing 5, visited [0, 3]
done traversing 4, visited [0, 3]
done traversing 3, visited [0]
Budget 200 exceeded at cost 420 for path [0, 2, 4]
stack [(0, 0, True), (1, 10, False), (2, 20, True)]
path [0, 2]
done traversing 2, visited [0]
stack [(0, 0, True), (1, 10, True), (4, 50, False)]
path [0, 1]
stack [(0, 0, True), (1, 10, True), (4, 50, True), (5, 100, False), (6, 110, False)]
path [0, 1, 4]
Budget 200 exceeded at cost 810 for path [0, 1, 4, 6, 7]
stack [(0, 0, True), (1, 10, True), (4, 50, True), (5, 100, False), (6, 110, True), (8, 118, False)]
path [0, 1, 4, 6]
Cycle detected: [0, 1, 4, 6, 8, 4]
stack [(0, 0, True), (1, 10, True), (4, 50, True), (5, 100, False), (6, 110, True), (8, 118, True)]
path [0, 1, 4, 6, 8]
done traversing 8, visited [0, 6, 1]
done traversing 6, visited [0, 1]
stack [(0, 0, True), (1, 10, True), (4, 50, True), (5, 100, True), (7, 200, False)]
path [0, 1, 4, 5]
Path found at cost 200: [0, 1, 4, 5, 7]
Printing path
0 -> 1 -> 4 -> 5 -> 7
Number of jumps is 5
Distance from 0 to 7 is 4
Cost from 0 to 7 is 200

There are undoubtedly optimization opportunities remaining, but based on my understanding of your question and objective as outlined in the assumptions in this answer, I believe this code should find a path within a cost budget using DFS.

constantstranger
  • 9,176
  • 2
  • 5
  • 19
  • I love the intuition behind this solution! I initially tried adding a third value that holds possible paths left which would be the connecting nodes, and when one failed it would subtract 1, and if it hit 0 it would remove the node which could chain upwards. But that was ineffective as I ended up messing up the order of the stack. Using a bool solves that isuee perfectly – KerubinGema Yuen Feb 15 '22 at 02:06
  • I'm still hitting an infinite loop in the large graph test case but I think some minor tracing should find out what is causing it, will try it out after my classes later on, thanks for the help! – KerubinGema Yuen Feb 15 '22 at 02:07
  • 1
    There's a bug in the `if waiting_for_adj_list` block: you should only set `visited[node] = False` if `parent[node] == next_node`. Otherwise, you'll mark previous nodes in your current path as unvisited, and probably enter some cycles as well. – kcsquared Feb 15 '22 at 18:34
  • @kcsquared Nice catch, I've updated my answer with your fix. – constantstranger Feb 15 '22 at 18:46
  • @KerubinGema Yuen, do you need more help getting my answer to work? – constantstranger Feb 20 '22 at 22:51
  • @constantstranger Yup! It worked, issue was I wasn't giving the algorithm enough time to run – KerubinGema Yuen Feb 24 '22 at 13:32