Merge output from cugraph over vertex_id with input data

Question

If I create a graph with cugraph and then calculate position from the nodes or communities, I get a dataframe with information and a vertex id.

So I have three questions:

How is the vertex id created?
Is there a way to merge the output data over the vertex id with the input data?

Is it possible to store the information like in networkx directly in the graph object?

 G = cugraph.Graph() 

 G = cugraph.from_cudf_edgelist(edges  , source = 'source', destination = 'target')   

 communities = cugraph.louvain(G)

 pos = cugraph.force_atlas2(G, max_iter=10)

#################################

Answer to 2.

With the help from @Don_A answer and the comments from @BradRees I was able to merge the output data with the input data. The first step is creating a unique nodelist and after that merging it with the output data.

edges = cudf.read_csv('edges.csv')
nodes_source = edges.loc[:, ['Source', 'retweet_author']].rename(columns={"Source": "node", "retweet_author": "author"})
nodes_target = edges.loc[:, ['Target', 'orginal_author']].rename(columns={"Target": "node", "orginal_author": "author"})
node_list = nodes_source.append(nodes_target).drop_duplicates('node')

G = cnx.Graph()
G = cnx.from_cudf_edgelist(edges  , source = 'Source', destination = 'Target', edge_attr = 'weight')

communities, modularity_score = cnx.louvain(G)

node_list.merge(communities, left_on="node",right_on="vertex").reset_index()

score 2 · Accepted Answer · answered Sep 14 '22 at 19:24

1: How is the vertex id created? In your example you have an "edges" dataframe that contains the COO data. That data specifies the vertex IDs. cuGraph uses the IDs that you specify, it does not create new ones

2: Is there a way to merge the output data over the vertex id with the input data? In your example you have a dataframe with edge data but created vertex data. But you could join the cluster information back on top the src and then the dst part of the edge data. That is all done with cuDF.

3. Is it possible to store the information like in networkx directly in the graph object? Yes. You just need to use the new Property Graph class. See the example below taken from a presentation at a recent GTC

import cudf 
import cugraph 
from cugraph.experimental import PropertyGraph

# Import a built-in dataset
from cugraph.experimental.datasets import karate

# Read edgelist data into a DataFrame, load into PropertyGraph as edge data.

# Create a graph using the imported Dataset object
graph = cugraph.Graph(directed=False)
G = karate.get_graph(create_using=graph, fetch=True)

df = G.edgelist.edgelist_df
pG = PropertyGraph() 
pG. add_edge_data(df, vertex_col_names=("src", "dst"))

# Run Louvain to get the partition number for each vertex. 
# Set resolution accordingly to identify two primary partitions. 
(partition_info, _) = cugraph.louvain(pG.extract_subgraph(create_using=graph), resolution=0.6)

# Add the partition numbers back to the Property Graph as vertex properties 
pG.add_vertex_data(partition_info, vertex_col_name="vertex")

# Use the partition properties to extract a Graph for each partition. 
G0 = pG.extract_subgraph(selection=pG.select_vertices("partition == 0"))
G1 = pG.extract_subgraph(selection=pG. select_vertices("partition == 1"))

# Run pagerank on each graph, print results. 
pageranks0 = cugraph.pagerank(G0) 
pageranks1 = cugraph.pagerank(G1) 
print(pageranks0.sort_values (by="pagerank", ascending=False).head(3))
print(pageranks1.sort_values (by="pagerank", ascending=False).head(3))

Thanks you @Don A, in my edge list I have the scource and destination of every edge, which are nodes/vertex at the same time. Are the ids in the source and destination column then the same as the vertex_id? If not can you give me an example how I join a nodes/vertex from source or destination with the vertex_id? — padul, Sep 15 '22 at 08:46
yes, the edge data is comprised of source vertex id to destination vertex id. So the ids returned by the louvain call will match what is in the edge list. — BradRees, Sep 15 '22 at 18:21
the "edges" data is in a cuDF Dataframe. You can then just to a edges.join(communities, lsuffix="src", rsuffix="vertex") that will produce a new dataframe with (src, dst, partition). But you need to remember that the partition relaates to the src vertex. abtter approve would be to do a lot of renaming the columns. I'll try and get an example posted — BradRees, Sep 15 '22 at 18:27
@BradRees Thank for you help. It does not works with join, but with merge. I updated my question — padul, Sep 19 '22 at 20:11

Merge output from cugraph over vertex_id with input data

1 Answers1