0

I'm trying to get a list of nodes like APSP (all pairs short path) and want to use rapids cugraph for GPU acceleration. Researched a bit and created this script which is working but it's very slow. I suppose I'm doing the wrong iteration and there could be a better way to achieve the same result faster. Am I on the wrong way? Thank you!

import sqlalchemy
import cugraph
import cudf
import pandas as pd
from datetime import datetime

s_time = datetime.now()
engine = sqlalchemy.create_engine('postgresql://postgres:xxxxxxx@localhost:5432/postgres')
sql = "select id, source, target, cost, geom from xxx.roads_noded"
rc = "select source, target from xxx.routin_candidates"
df = pd.read_sql(sql, engine)
rcdf = pd.read_sql(rc, engine)
cuda_g = cudf.DataFrame.from_pandas(df)
cuda_nc = cudf.DataFrame.from_pandas(rcdf)

G = cugraph.Graph()
G.from_cudf_edgelist(cuda_g, source='source', destination='target', edge_attr='cost')

for index, row in cuda_nc.to_pandas().iterrows():
    src = row['source']
    dest = row['target']
    routes = cugraph.sssp(G, src)
    for index, row in routes.to_pandas().iterrows():
        v = int(row['vertex'])        
        if v == dest:
            p = cugraph.utils.get_traversed_path_list(routes, v)
            autoroute = p[::-1]
            print(autoroute)

e_time = datetime.now()
print('Duration: {}'.format(e_time - s_time))
buddemat
  • 4,552
  • 14
  • 29
  • 49

1 Answers1

2

cuGraph is working on APSP algorithms and should have one out this year.

We do have a new function called "multi_source_bfs" which allows you to specify multiple starting source node from which yoou can run a BFS. The problem is that if paths cross, the first or lowest node ID paths wins.

We are also in the process of adding a better function for extracting paths. That code is in the C++ library but is not yet available at the Python layer. That function will allow you to specify starting and ending node IDs and then, in parallel, extract all those paths from a BFS or SSP result.

Brad Rees
  • 96
  • 1
  • Brad Rees, thank you for response. I removed second loop and code is much faster. Unfortunately, I didn't find documentation how cugraph.utils.get_traversed_path_list() works? Seems it needs graph and destination value (int) to work. I changed p = cugraph.utils.get_traversed_path_list(routes, dest) and it's faster. Now trying to use pandas vectorization or .apply() to remove first loop. – HumanoVirtual Jan 15 '22 at 02:17
  • Brad Rees, could you please refer to documentation? I didn't find anything about multi_source_bfs. I reviewed all pages here: https://docs.rapids.ai/api/cugraph/stable/api_docs/api/cugraph.traversal.bfs.bfs.html Thank you! – HumanoVirtual Jan 15 '22 at 08:44
  • seems it's not implemented yet in rapids-21.12 - "NotImplementedError: concurrent_bfs is coming soon! Please up vote the github issue 1465 to help us prioritize" – HumanoVirtual Jan 16 '22 at 06:06
  • The function is "multi_source_bfs". I noticed that the docs were missing, they should show up in the nightly build out later this week – Brad Rees Jan 17 '22 at 23:39
  • @HumanoVirtual The get_traversed_path_list code and docs should be ready this week – Brad Rees Jan 17 '22 at 23:40
  • Thank you for your response. I'll follow the webpage and GitHub for documentation! – HumanoVirtual Jan 18 '22 at 17:16
  • We rolled multi-seed bfs into the standard bfs call. seed can now be an array. this saves users from having two different function calls – Brad Rees Jan 21 '22 at 17:43
  • thank you for the update. Does it mean I have to reinstall nightly version? Also, could you please post the correct URL where updated docs are? I didn't find anything about get_traversed_path_list Thank you! – HumanoVirtual Jan 23 '22 at 09:43
  • @HumanoVirtual the code has been there for a while, so no need to reinstall. We have wanted to make it better before fully advertising it, so the docs have been slow coming. – Brad Rees Jan 24 '22 at 13:52
  • got it, thank you! – HumanoVirtual Jan 25 '22 at 08:11