1

I am new to Spark Graphx and have dataframe for edges as:

Dataframe : edges_main
+------------------+------------------+------------+--------+-----------+
|               src|               dst|relationship|category|subcategory|
+------------------+------------------+------------+--------+-----------+
|294201130817328347|294201131015844283|   friend   |  school|      class|
|294201131015844283|294201131007361339|  brother   |   home |     cousin|
|294201131015844283|294201131014451003|  son       |   home |   relative|
-------------------------------------------------------------------------

and vertices as:

Dataframe : vertices_main
+------------------+----------+
|               id |value|name|
+------------------+----------+
|294201130817328347|Mary |a   |
|294201131015844283|Hola |b   |
|294201131015844283|Rama |c   |
-------------------------------

I want to preserve additional attributes in Graphx in so that I can access them with map. My code:

case class MyEdges(src: String, dst: String, attributes: MyEdgesLabel)
case class MyEdgesLabel(relationship:String,category: String ,subcategory:String)

val edges = edges_main.as[MyEdges].rdd.map { edge =>
      Edge(
        edge.src.toLong,
        edge.dst.toLong,
        //**what to mention here(MyEdgesLabel)**//
      )}

case class MyVerticesLabel(name:String)

val vertices: RDD[(VertexId, Any)] = vertices_data.rdd.map(verticesRow => (
      verticesRow.getLong(0),
      verticesRow.getString(1))
//**what to mention here(MyVerticesLabel)**//
    )

The reason of above requirement is the after creating graph, I can access additional attributes directly in following way:

val g = Graph(vertices, edges)
g.vertices.map(v => v._1 + v._2 + /*addidtional attributes which is in case class MyEdgesLabel*/).collect.mkString 
g.edges.map(e =>  e.srcId + e.dstId + e.attr(/*addidtional attributes which is in case class 
 MyVerticesLabel*/))).collect.mkString

I got some clue from below url yet I'm still confused in catering multiple attributes in both vertices and edges: http://www.sunlab.org/teaching/cse6250/fall2019/spark/spark-graphx.html#graph-construction.

Kindly help regarding the same.

Shaido
  • 27,497
  • 23
  • 70
  • 73
Arshanvit
  • 417
  • 1
  • 7
  • 28

1 Answers1

1

You can use a case class as edge attribute and another as the vertex property. MyEdgesLabel is already ok for the edges, to crete the edge RDD, simply do:

val edges = edges_main.as[MyEdges].rdd.map { edge =>
      Edge(
        edge.src.toLong,
        edge.dst.toLong,
        MyEdgesLabel(edge.relationship, edge.category, edge.subcategory)
      )}

For the vertices, you need to include both value and name in the case class:

case class MyVerticesLabel(value: String, name: String)

Then use it to create the vertex RDD:

val vertices: RDD[(VertexId, MyVerticesLabel)] = vertices_data.rdd.map{verticesRow => 
    (verticesRow.getAs[Long]("id"),
    MyVerticesLabel(verticesRow.getAs[String]("value"), verticesRow.getAs[String]("name")))
}

Now, the values can easily be accessed, e.g.:

g.edges.map(e =>  e.srcId + e.dstId + e.attr.relationship).collect.mkString
Shaido
  • 27,497
  • 23
  • 70
  • 73
  • It is working fine except last ```g.edges.map``` section, it is not getting relationship.It is coming in ```e.attr``` as ```MyVerticesLabel(friend,school,class)``` when printed in text file. – Arshanvit Sep 04 '20 at 14:24
  • @UtkarshSaraf: It looks like you are doing something differently as compared to the code in the question (since there is no mention of fried, school and class). If you use `g.edges.map(_.attr)` then you will get the attributes of the edges, above that is `MyEdgesLabel` since that is the value we used when creating `Edge`. Maybe you mistakenly used `MyVerticesLabel` when creating the eedges? – Shaido Sep 05 '20 at 05:15