0

I want to convert existing R code into pyspark. The code I am converting is creating unidirected graph using pairs from edge list.

R code: (library is igraph)

  # create an undirected graph using the selected pairs
  gg <- graph.edgelist(as.matrix(unique(df[, list(valx, valy)])), directed = FALSE)
  # 
  cl <- split(V(gg)$name, clusters(gg)$membership)

  # from the constructed graph, select a list of nodes 
  dt <- cbind(as.data.table(V(gg)$name), as.data.table(clusters(gg)$membership))

my input dataframe is df

    valx      valy 
1: 600060     09283744
2: 600131     96733110 
3: 600194     01700001

I have tried Graphframes in pyspark and networx library too, but not getting desired results

My output should look like below (its basically all valx and valy under V1 and their membership info under V2)

V1               V2
600060           1
96733110         1
01700001         2

Can anyone please guide how to implement above code in pyspark. (Even if the output doesnt come as above its okay but i need the equvalent code snippet or library) 600194 2

Tilo
  • 409
  • 1
  • 5
  • 14
  • When the `id` is part of valx the V2 value is 1 and when `id` is part of valy then the V2 value becomes 2. Is this correct? – cronoik Jun 08 '19 at 20:48
  • No, V2 is basically shows the cluster membership which tries to find Community Detection – Tilo Jun 09 '19 at 10:33
  • Okay. Which community detection algorithm did you use? There are at least 4 in graphframes. – cronoik Jun 09 '19 at 11:27
  • I am using from igraph now 1.ommunity_fastgreedy() 2. community_edge_betweenness 3. also tried girvan_newman(G) from networkx – Tilo Jun 09 '19 at 11:29
  • But I am unable to get vertex with its membership – Tilo Jun 09 '19 at 11:30
  • Does the community detection algorithm not matter for you? I can post you a solution with graphframes. – cronoik Jun 09 '19 at 11:36
  • Yeah It would be grat help if you can post the solution. What I wanted is list of vertex along with its membership. I tried below with networks too ```g = nx.from_pandas_edgelist(Panda_edgelist,'valx','valy') list(g.nodes) – Tilo Jun 09 '19 at 11:40
  • G2 = g.to_undirected(g) list(G2.nodes) #list(G2) nx.connected_components(G2) nx.clustering(G2) – Tilo Jun 09 '19 at 11:41
  • @cronoik : it would be great help, if you can please post the solution as you mentioned with graphframes – Tilo Jun 10 '19 at 08:39

0 Answers0