1

I have a matrix stored in tuples

mat = RDD[(i-idx, j-idx, value)]

and a vector in tuples

vec = RDD[(idx, value)]

And I want to do the mat * vec, what i planed to to is:

val map_mat = mat.map(x => (x._2, (x._1, x._3))) // Turn mat in to key-value pairs.
val join_mat = map_mat.join(vec) // Join the mat and vec on (j-idx == idx)
val mul_mat = join_mat.map(x => (x._2._1._1, x._2._1._2 * x._2._2))
val res_vec = mul_mat.reduceByKey(_ + _)

After each of these steps, i should get:

join_mat: RDD[(j-idx, ((i-idx, value1), value2)]
mul_mat: RDD[(i-idx, value1 * value2)]
res_vec: RDD[(i-idx, sum(value1 * value2)]

It may work but I think it's too nasty to write code like this...especially the statement for mul_mat.

Is there any way better to do this job? Thanks!

I do not want to use MLlib and MLlib didn't implement matrix muliplication for Coordinate Matrix, so I have to implement it myself.

qin.sun
  • 73
  • 9
  • 2
    Possible duplicate of [Matrix Multiplication in Apache Spark](http://stackoverflow.com/questions/33558755/matrix-multiplication-in-apache-spark) – zero323 May 14 '16 at 10:34
  • If you just want better syntax, use pattern-matching: `join_mat.map { case (_, ((x1, x2), y)) => (x1, x2 * y) }`. However, @zero323 answer looks much better than ad-hoc solution. – Vitalii Kotliarenko May 14 '16 at 14:40

0 Answers0