-1

I have extracted the links between the wikipedia pages in an RDD which has the following format:

Array[(String, String)] = Array((AccessibleComputing,[Computer accessibility]), 
                      (Anarchism,[political philosophy, stateless society]))

Where the first string is a page (Vertex) and the second is a list of links (Edges) pointing towards other Wiki pages.

How can I convert it into, graph friendly format like that:

Array(
(AccessibleComputing,Computer accessibility),
(Anarchism,stateless society),
(Anarchism,political philosophy)
)

so that the edge is repeated for each vertex

ulrich
  • 3,547
  • 5
  • 35
  • 49

1 Answers1

0

drop, split and flatMap?

data.flatMap{case (k, v) => v.drop(1).dropRight(1).split(", ").map((k, _))}
zero323
  • 322,348
  • 103
  • 959
  • 935