For an example of using GraphX to efficiently process sparse matrices, see the source code to GraphX's implementation of SVD++.
https://github.com/apache/spark/blob/branch-1.6/graphx/src/main/scala/org/apache/spark/graphx/lib/SVDPlusPlus.scala
Basically, it just uses aggregateMessages(), and so one message per non-zero entry in the adjacency matrix gets sent to the adjacent vertex -- thereby avoiding even considering (processing) the zero-valued entries of the adjacency matrix.
EDIT (additional information):
First, you have to plan what is going to get stored at each vertex, and also how you're going to collect this information to produce M(i,j) in the end. Notice that the two norms in the denominator, |A(:,i)| and |A(:,j)| are used repeatedly. If there are n vertices in the graph (that is, if A is an n x n matrix), then there are only n |A(:,i)|'s even though there are n2 M(i,j)'s to be computed.
A good plan would be for each vertex i to store two vectors (e.g. in a Tuple2 of two Array[Double]s): |A(:,i)| and [, ... , ] (call this one Vi). Then at the very end you would compute M by extracting this information from your graph.vertices() and combining it to produce M.
|A(:,i)| is easy. For each vertex i, that's just the number of inbound edges. (To see this, think about what it means for A to be an adjacency matrix and draw a diagram.)
Vi is a bit trickier, but not overly so. First, for each vertex, we're going to need to come up with a vector and not just a single number like we did for |A(:,i)|. And each component of that vector of length n is going to require up to potentially n inputs.
Thinking back to the meaning behind the adjacency matrix, to compute the jth component of Vi (which would be , that is a sum of n products), we only need to add a 1 whenever some vertex k has an edge to both i and j. Therefore, an approach you could take is to use aggregateMessages twice in a row: to transmit neighboring vertices backward along edges. To use some really loose terminology: first from the j vertices to the k vertices, and then from the k vertices to the i vertices. That way each vertex knows all of its neighbors within two hops (and it's OK for each vertex to accumulate that much information if A is sparse). This will allow you to compute Vi.