modifying spark GraphX pageRank to do random walk with restart

Question

I am trying to implement random walk with restart by modifying the Spark GraphX implementation of PageRank algorithm.

  def randomWalkWithRestart(graph: Graph[VertexProperty, EdgeProperty], patientID: String , numIter: Int = 10, alpha: Double = 0.15, tol: Double = 0.01): Unit = {

var rankGraph: Graph[Double, Double] = graph
  // Associate the degree with each vertex
  .outerJoinVertices(graph.outDegrees) { (vid, vdata, deg) => deg.getOrElse(0) }
  // Set the weight on the edges based on the degree
  .mapTriplets( e => 1.0 / e.srcAttr, TripletFields.Src )
  // Set the vertex attributes to the initial pagerank values
  .mapVertices( (id, attr) => alpha )

var iteration = 0
var prevRankGraph: Graph[Double, Double] = null
while (iteration < numIter) {
  rankGraph.cache()

  // Compute the outgoing rank contributions of each vertex, perform local preaggregation, and
  // do the final aggregation at the receiving vertices. Requires a shuffle for aggregation.
  val rankUpdates = rankGraph.aggregateMessages[Double](
    ctx => ctx.sendToDst(ctx.srcAttr * ctx.attr), _ + _, TripletFields.Src)

  // Apply the final rank updates to get the new ranks, using join to preserve ranks of vertices
  // that didn't receive a message. Requires a shuffle for broadcasting updated ranks to the
  // edge partitions.
  prevRankGraph = rankGraph
  rankGraph = rankGraph.joinVertices(rankUpdates) {
    (id, oldRank, msgSum) => alpha + (1.0 - alpha) * msgSum
  }.cache()

  rankGraph.edges.foreachPartition(x => {}) // also materializes rankGraph.vertices
  //logInfo(s"PageRank finished iteration $iteration.")
  prevRankGraph.vertices.unpersist(false)
  prevRankGraph.edges.unpersist(false)

  iteration += 1

}

}

I believe the (id, oldRank, msgSum) => alpha + (1.0 - alpha) * msgSum part should be changed, but I am not sure how. I need to add the ready state probability to this line.

Furthermore, the ready state probability should be initialized somewhere before the while loop. And the ready state probability has to be uploaded inside the while loop.

Any suggestions would be appreciated.

I am not sure modifying PageRank is the best implementation since it propagates out to every connected neighbor rather than 'walking' the graph. You could do something like ConnectedComponents that propagates a label along path. You would just need to have a vertex randomly pick one of its neighbors to send a positive value, while the other either do not get a message or are passed zero. One of the issues with Spark is that it always operates across the whole graph, so doing a walk from one vertex becomes a challenge. — BradRees, May 06 '15 at 21:21

modifying spark GraphX pageRank to do random walk with restart

0 Answers0