More efficient way of running a random traversal of a directed graph with Networkx

Question

I am trying to simulate a random traversal through a directed networkx graph. The pseudo code is as follows

Create graph G with nodes holding the value true or false. 
// true -> visited, false -> not visited

pick random node N from G
save N.successors as templist
while true
    nooptions = false
    pick random node N from templist
    while N from templist has been visited
        remove N from templist
        pick random node N from templist
        if templist is empty
            nooptions = true
            break
    if nooptions = true 
        break
    save N.successors as templist

Is there are a more efficient way of marking a path as traveled other than creating a temporary list and removing the elements if they are marked as visited?

EDIT

The goal of the algorithm is to pick a node at random in the graph. Pick a random successor/child of that node. If it is unvisited, go there and mark it as visited. Repeat until there are either no successors/children or there are no unvisited successors/children

What is the goal? To pick a random node, and then pick a random node that it can access, and extract that path? My answer is assuming that... — Corley Brigman, Mar 03 '14 at 15:29
Thanks for replying @CorleyBrigman . I have the code for starting at the source already, and I will try out your last solution later tonight! I'm estimating the number of my nodes to be around the order of hundred thousand. The goal is to start at a random node and randomly pick a node to access until there are no more nodes to access. The same node should not be accessed more than once. I'm hoping to run the simulation a number of times(to be determined later) and storing how many times a node is accessed, so I think efficieny will be a factor. — Linus Liang, Mar 03 '14 at 16:17
I'm just checking my assumptions for "start at a random node and randomly pick a node"... you mean, start at a node, pick a random successor, then pick one of that's nodes successors, etc. until you get to a node with no successors? — Corley Brigman, Mar 03 '14 at 16:28
Yes, until I get to a node with no successors or all the successors have already been marked as visited. — Linus Liang, Mar 03 '14 at 16:51

score 4 · Accepted Answer · answered Mar 03 '14 at 15:39

4

Depending on the size of your graph, you could use the built-in all_pairs_shortest_path function. Your function would then be basically:

G = nx.DiGraph()
<add some stuff to G>

# Get a random path from the graph
all_paths = nx.all_pairs_shortest_path(G)

# Choose a random source
source = random.choice(all_paths.keys())
# Choose a random target that source can access
target = random.choice(all_paths[source].keys())
# Random path is at
random_path = all_paths[source][target]

There doesn't appear to be a way to just generate the random paths starting at source that I saw, but the python code is accessible, and adding that feature would be straightforward I think.

Two other possibilities, which might be faster but a little more complicated/manual, would be to use bfs_successors, which does a breadth-first search, and should only include any target node once in the list. Not 100% sure on the format, so it might not be convenient.

You could also generate bfs_tree, which generates a subgraph with no cycles to all nodes that it can reach. That might actually be simpler, and probably shorter?

# Get random source from G.node
source = random.choice(G.node)

min_tree = nx.bfs_tree(G, source)
# Accessible nodes are any node in this list, except I need to remove source.

all_accessible = min_tree.node.keys()
all_accessible.remove(source)
target = random.choice(all_accessible.node.keys())

random_path = nx.shortest_path(G, source, target)

answered Mar 03 '14 at 15:39

Corley Brigman

11,633
5
33
40

Why do you remove the source? from the bfs_tree – Linus Liang Mar 04 '14 at 00:39
For your last solution, I don't necessarily want to take the shortest_path from one node to another. Will this allow a truly random walk? Also, If my target is a random choice, say the parent is the source, and the child is the target. Will I only walk one edge even though the target may have more children? Thanks for your help so far! – Linus Liang Mar 04 '14 at 00:48
The `bfs_tree` includes the source and all reachable targets. And the format of a `DiGraph` is that `G.node` is a dictionary. So `G.node.keys()` is a list of the source + all targets. To pick a random target, you can use `random.choice` with that list, but you don't want the source, so it can be removed. – Corley Brigman Mar 04 '14 at 04:24
you are right, with that last solution, you will only get the shortest path between that random source and target. the benefit is that you have picked a random source and target that you know there some path between. but if you want a truly random path, then your original algorithm works, i would just use a set instead of a list - checking for membership is much cheaper. you could also use my method with `all_simple_paths`, which will give all paths without cycles - create a list and pick a random entry. that will be pretty evenly random, but probably not very efficient... – Corley Brigman Mar 04 '14 at 04:27
Great, thank you very much for your help! I'll switch to using a set for my original algorithm. Your algorithm works great otherwise! I just needed a more evenly random walk that will reach the "end"(no more successors or no more unvisited successors) – Linus Liang Mar 04 '14 at 14:31

More efficient way of running a random traversal of a directed graph with Networkx

1 Answers1