2

I have a very large graph. where there are links between the nodes. Each edge has weight 1 initially. I have to update the weights of edges according to transformed adjacency matrix.

enter image description here

Where A is Adjcency Matrix. The new weight in nodes (i,j) will be given by M(i,j).

I have to do this in Graphx. How I do approach for this?

My Approach: Find all the neighboring nodes for each node and the inner join them.in pair. then update weights of each node.

But I am little confused about writing efficient code in Graphx. How I do I proceed about this? Snaps of code is appreciated.

Amnesiac
  • 661
  • 1
  • 10
  • 30
  • 1
    What's your actual question, other than "gimme the code"? – The Archetypal Paul Apr 22 '16 at 08:23
  • Do we get to share the `A` in your class when you turn in our code? – David Griffin Apr 22 '16 at 15:57
  • why does it need to be graphx ? – eliasah Apr 22 '16 at 17:10
  • @TheArchetypalPaul , I am implementing subgraph finding algorithm based on paper. The first step is change the adjacency matrix to Weighted matrix based on the given formula above. I am confused how to write a code efficiently to update the edge weights of the graph. SO my question is can you guide me how to convert the adjcency matrix into M matrix and update the weights? – Amnesiac Apr 22 '16 at 23:55
  • Still sounds like "gimme the code" to me. What have you tried, or where are you stuck adapting @MichaelMalak's answer? – The Archetypal Paul Apr 23 '16 at 09:25

1 Answers1

2

For an example of using GraphX to efficiently process sparse matrices, see the source code to GraphX's implementation of SVD++.

https://github.com/apache/spark/blob/branch-1.6/graphx/src/main/scala/org/apache/spark/graphx/lib/SVDPlusPlus.scala

Basically, it just uses aggregateMessages(), and so one message per non-zero entry in the adjacency matrix gets sent to the adjacent vertex -- thereby avoiding even considering (processing) the zero-valued entries of the adjacency matrix.

EDIT (additional information):

First, you have to plan what is going to get stored at each vertex, and also how you're going to collect this information to produce M(i,j) in the end. Notice that the two norms in the denominator, |A(:,i)| and |A(:,j)| are used repeatedly. If there are n vertices in the graph (that is, if A is an n x n matrix), then there are only n |A(:,i)|'s even though there are n2 M(i,j)'s to be computed.

A good plan would be for each vertex i to store two vectors (e.g. in a Tuple2 of two Array[Double]s): |A(:,i)| and [, ... , ] (call this one Vi). Then at the very end you would compute M by extracting this information from your graph.vertices() and combining it to produce M.

|A(:,i)| is easy. For each vertex i, that's just the number of inbound edges. (To see this, think about what it means for A to be an adjacency matrix and draw a diagram.)

Vi is a bit trickier, but not overly so. First, for each vertex, we're going to need to come up with a vector and not just a single number like we did for |A(:,i)|. And each component of that vector of length n is going to require up to potentially n inputs.

Thinking back to the meaning behind the adjacency matrix, to compute the jth component of Vi (which would be , that is a sum of n products), we only need to add a 1 whenever some vertex k has an edge to both i and j. Therefore, an approach you could take is to use aggregateMessages twice in a row: to transmit neighboring vertices backward along edges. To use some really loose terminology: first from the j vertices to the k vertices, and then from the k vertices to the i vertices. That way each vertex knows all of its neighbors within two hops (and it's OK for each vertex to accumulate that much information if A is sparse). This will allow you to compute Vi.

Michael Malak
  • 628
  • 8
  • 19
  • Thanks for reply. The aggregate messages work when I want to update the weight of edges. But I want to calculate the cosine of two columns of adjacency matrix as shown in formula and update the corresponding weight. So how do I achieve this? – Amnesiac Apr 23 '16 at 00:26
  • Updated my answer to outline a possible concrete approach. – Michael Malak Apr 23 '16 at 15:20