1

I was comparing the efficiency of Breadth-first search and Floyd–Warshall algorithm in solving shortest path problem in python, although the complexity of Floyd–Warshall algorithm is much larger then that of BFS, BFS seems to be taking more time

def BFS_ForNode(query_node, parents):
        result = {}
        queue = []
        queue.append( (query_node, 0) )
        while queue:
            node, dist = queue.pop(0)
            result[node] = dist
            if node in parents: 
                for parent in parents[node]:
                    queue_members = [x[0] for x in queue]
                    if parent not in result and parent not in queue_members: 
                        queue.append( (parent, dist+1) )
        return result

def BFS(Graph,vertix_num):
        for i in range(vertix_num):
            BFS_ForNode(i, Graph)

def floydWarshall(graph , V):
    dist = map(lambda i : map(lambda j : j , i) , graph)
    for k in range(V):
        for i in range(V):

            for j in range(i):

                dist[i][j] = min(dist[i][j] ,
                                  dist[i][k]+ dist[k][j]
                                )
                dist[j][i] = dist[i][j]

    return dist

Is there some data structure I am using for BFS that makes it very slow and is there anyway to optimize it

cs95
  • 379,657
  • 97
  • 704
  • 746
Aya Abdelsalam
  • 129
  • 2
  • 12
  • 1
    Without analyzing those algorithms: there are a lot of potentially slow operations in this BFS-code, while Floyd is just a simple nested loop and very simple operations (depending a bit on the data-structure of dist). Examples: ```node in parents```: might be linear-search, depending on parents, ```parent in parents[node]```: same,```[x[0] for x in queue]``` filtering of some queue, n_queue*linear-searches in queue, ```if parent not in result and parent not in queue_members``` one hash-table lookup (good asymptotics but some overhead) + one linear-search. **Use a profiler to see what's slow** – sascha Sep 08 '17 at 11:42

2 Answers2

1

I suppose this question is not so relevant for the op, but still for future people:

Original question:

Is there some data structure I am using for BFS that makes it very slow and is there anyway to optimize it

The answer from my POV: yes. There are some performance issues in your example:

  1. You are using list instead of proper collection like deque ( from collections module ). to store your items. In fact, when you are doing pop(0) from it, list should shift all items in list ( 1, n ) on one position - this is time consuming operation.

  2. Also when you are looking to the existence of item in the list, it is O(N) complexity, if you'll use set() instead, it will take you only O(1), because set structure based on the hash ( like dict ) and it is constant time to check is item in set.

If you're interested in, from my perspective the best implementation for BFS is:

from _collections import deque    

def bfs(graph, start):
    """BFS which will collect levels/distances of each neighbour of start vertex and so on."""
    queue = deque([start])
    distances = {start: 0}
    while queue:
        vertex = queue.popleft()
        for neighbour in graph[vertex]:
            if neighbour not in distances:
                queue.append(neighbour)
                distances[neighbour] = distances[vertex] + 1
    return distances
Visel_chak
  • 71
  • 3
0

Assuming you start BFS by calling BFS(), your BFS routine actually takes more than linear time. It treats every vertex in graph as a source vertex, and restarts a BFS for each vertex. which makes time quadratic even if your body of the loop takes constant time (it doesn't).

Additionally, [x[0] for x in queue] and if node in parents: can all be linear time worst case, making your worst case runtime approximately equal or even greater than Floyd Warshall.

BearAqua
  • 518
  • 4
  • 15