0

I tried to create a Graph from a dask_cudf DataFrame, but the Graph get Nonetype without error Message. I tried it with the same data set also with a pandas dataframe. Then I tried it with three sample edges. Each time a NoneType object. However, if I use the Karate dataset, everything works. I perform the exact same steps three times. The column types are also the same

From_dask_edgelist

cluster = LocalCUDACluster()
client = Client(cluster)
Comms.comms.initialize(p2p=True)

edges = dask.read_csv('.csv')
edges = edges.groupby(['Source','Target'])['retweet_from'].count()
edges = edges.to_frame(name="weight").reset_index()
edges = edges.map_partitions(cudf.DataFrame.from_pandas)
G = cugraph.Graph().from_dask_cudf_edgelist(edges,
                                            source = 'Source',
                                            destination = 'Target',
                                            edge_attr = 'weight')

G.__class__
NoneType

From_Pandas_edgelist Karate Dataset

url = 'https://raw.githubusercontent.com/rapidsai/cugraph/branch-22.10/datasets/karate.csv'
df = pd.read_csv(url,delimiter=' ', header=None, names=["0", "1", "2"],
dtype={"0": "int32", "1": "int32","2": "float32"})

G = cugraph.Graph()
G.from_pandas_edgelist(df, source='0', destination='1',edge_attr='2', renumber=False)

G.__class__
cugraph.structure.graph_classes.Graph

From_Pandas_edgelist

edges = pd.read_csv('.csv')
edges = edges.groupby(['Source','Target'])['retweet_from'].count()
edges = edges.to_frame(name="weight").reset_index()
edges['Source'] = edges['Source'].astype("int32")
edges['Target'] = edges['Target'].astype("int32")
edges['weight'] = edges['weight'].astype("float32")
edges.dtypes
Source      int32
Target      int32
weight    float32
dtype: object

G = cugraph.Graph()
G = G.from_pandas_edgelist(edges,source = 'Source',destination = 'Target',edge_attr = 'weight', renumber=False)

G.__class__
NoneType

From_Pandas_edgelist with three Edges

data = [[1, 3,3], [2, 1,1], [3, 1, 7]]
edges = pd.DataFrame(data, columns=['Source', 'Target', 'weight'])
edges['Source'] = edges['Source'].astype("int32")
edges['Target'] = edges['Target'].astype("int32")
edges['weight'] = edges['weight'].astype("float32")
G = cugraph.Graph()
G = G.from_pandas_edgelist(edges,source = 'Source',
                                            destination='Target',edge_attr = 'weight', renumber=False)
G.__class__
NoneType
padul
  • 134
  • 11

2 Answers2

1

Example 1: From_dask_edgelist
you need to add "edges['weight'] = edges['weight'].astype("float32")" so that the dtype of weight is correct. otherwise the from_dask_cudf_edgelist will throw and error and return None

Example 3: From_Pandas_edgelist
Example 4: From_Pandas_edgelist with three Edges
This will currently not work since 'from_pandas_edgelist' returns None (PR mentioned above fixes that). If you change
G = G.from_pandas_edgelist(edges,source = 'Source',destination = 'Target',edge_attr = 'weight', renumber=False)
to be just
G.from_pandas_edgelist(edges,source = 'Source',destination = 'Target',edge_attr = 'weight', renumber=False)
then it will work

BradRees
  • 106
  • 1
  • 8
  • Thanks, the hint for Example 1 does not work. I get still no error and a None. But the tip for from_pandas_edgelist works – padul Sep 28 '22 at 12:18
0

Not all of the cuGraph functions can be pipelined.
The call from_dask_cudf_edgelist return None

The perferred way is:

G = cugraph.Graph()
G.from_dask_cudf_edgelist(edges,
                           source = 'Source',
                           destination = 'Target',
                           edge_attr = 'weight')

Then G.class will work

BradRees
  • 106
  • 1
  • 8