I am just starting with GraphFrames, and though I am following the documentation, I am not able to get any result from the aggregateMessages function (it returns an empty dataframe). Here is a simplified example of my problem: I GraphFrames object called testGraph
such that my vertexRDD consists of only a single vertex Y
with no vertex attributes, and my edgeRDD consists of two records like this:
| src | dst | min_ts1 | min_ts2 |
| X | Y | 20 | null |
| Y | X | null | -10 |
Now, I want to implement a simple algorithm that sends the value of min_ts1
to dst
, and sends min_ts2
to the src
. The code I am using to implement this algorithm is :
import org.graphframes.lib.AggregateMessages
import org.apache.spark.sql.functions._
val AM = AggregateMessages
val msgToSrc = AM.edge("min_ts2)
val msgToDst = AM.edge("min_ts1")
val delay = testGraph
.aggregateMessages
.sendToSrc(msgToSrc)
.sendToDst(msgToDst)
.agg(sum(AM.msg).as("avg_time_delay"))
I realize there are some null values here, but regardless I would expect the message passing algorithm to do the following: look at the first record, and send a message of 20
to Y
and a message of null
to X
. Then look at the second record, and send a message of null
to X and a message of -10
to Y
. Finally I would expect the result to show that the sum of messages for Y
is 10
, and for there to be no record for X
in the result, since it was not included in the vertexRDD. And if X
were included in the vertexRDD, I would expect the result to be simply null
, since both of the messages were null
.
However, what I am getting is an empty RDD. Could someone please help me understand why I am getting an empty result?