I want to convert existing R code into pyspark. The code I am converting is creating unidirected graph using pairs from edge list.
R code: (library is igraph)
# create an undirected graph using the selected pairs
gg <- graph.edgelist(as.matrix(unique(df[, list(valx, valy)])), directed = FALSE)
#
cl <- split(V(gg)$name, clusters(gg)$membership)
# from the constructed graph, select a list of nodes
dt <- cbind(as.data.table(V(gg)$name), as.data.table(clusters(gg)$membership))
my input dataframe is df
valx valy
1: 600060 09283744
2: 600131 96733110
3: 600194 01700001
I have tried Graphframes in pyspark and networx library too, but not getting desired results
My output should look like below (its basically all valx and valy under V1 and their membership info under V2)
V1 V2
600060 1
96733110 1
01700001 2
Can anyone please guide how to implement above code in pyspark. (Even if the output doesnt come as above its okay but i need the equvalent code snippet or library) 600194 2