How to add attributes to the edges?

Question

I have the following problem. I have a dataframe "vert" in spark, consisting of three columns: Origin (String), Destination (String), Distance (Integer). So, it's simple the data about flights between different cities. For example it could look like this:

Chicago Houston 670
London Chicago 1200
...

I want to create the corresponding graph in GraphX and I want to take the distances as edge attributes to the graph. So first I have to define the edges rdd. I found the following way to do this:

val ed = vert.rdd
  .map(x => ((MurmurHash.stringHash(x(0).toString), MurmurHash.stringHash(x(1).toString)), 1))
  .reduceByKey(_+_)
  .map(x => Edge(x._1._1, x._1._2, x._2))

Unfortunately this command only takes the columns Origin and Destination into account and ignores the column Distance, so that I have no Information about the distances in the rdd "ed". How have I to change the command so that I have also the distances in rdd?

Sorry if it is a stupid question and thanks in advance.

Do you need a `reduceByKey` here? I.e. does the dataframe contain the same pair of cities multiple times (seems unlikely). — Shaido, May 02 '18 at 09:32
No, I have no duplicate rows. Is the reduceByKey only for the purpose of eliminating duplicates here? I thought that reduceByKey makes some aggregation. — Logic_Problem_42, May 02 '18 at 09:36
Oh, I have actually found the way. I can simply replace 1 with x(2) in my command. — Logic_Problem_42, May 02 '18 at 09:40
Yes, it will aggregate. But your current code will create a key based on the origin and destination columns, the `reduceByKey` will then add the number of rows that have the same cities (since you have 1 as values and `_+_` in the reduce. This will effectivly give you a dataframe with one row per origin/destination pair and a value representing the number of rows in the original dataframe these pair occur in. — Shaido, May 02 '18 at 09:42
You can simply do: `map(x => Edge(MurmurHash.stringHash(x(0).toString), MurmurHash.stringHash(x(1).toString), x(2))` directly for the same result. — Shaido, May 02 '18 at 09:43
By the way, the Edge method seems to accept only one attribute.Do You know how I can define several attributes? Or can I simply use the list of attributes as an argument of Edge? — Logic_Problem_42, May 02 '18 at 09:54
The easiest would be to use a tuple or a case class, you can see a more detailed answer of mine to this question here: https://stackoverflow.com/questions/46680128/spark-graphx-add-multiple-edge-weights/46680501#46680501 — Shaido, May 02 '18 at 09:58

How to add attributes to the edges?

0 Answers0