1

How to make a distributed BlockMatrix out of Matrices (of the same size)?

For example, let A, B be two 2 by 2 mllib.linalg.Matrices as follows

import org.apache.spark.mllib.linalg.{Matrix, Matrices}
import org.apache.spark.mllib.linalg.distributed.BlockMatrix

val A: Matrix = Matrices.dense(2, 2, Array(1.0, 2.0, 3.0, 4.0))
val B: Matrix = Matrices.dense(2, 2, Array(5.0, 6.0, 7.0, 8.0))
val C = new BlockMatrix(???)

How can I first make an RDD[((Int, Int), Matrix)] from A, B and second a distributed BlockMatrix out of A, B?

I'd appreciate any comment or help in advance.

zero323
  • 322,348
  • 103
  • 959
  • 935
Ehsan M. Kermani
  • 912
  • 2
  • 12
  • 26

1 Answers1

4

You can construct the BlockMatrix by first creating the RDD[((Int, Int), Matrix)]

val blocks: RDD[((Int, Int), Matrix)] = sc.parallelize(Seq(((0, 0), A), ((0, 1), B))

and then converting it into a BlockMatrix.

val blockMatrix: BlockMatrix = new BlockMatrix(blocks, 2, 2)

This will give you a BlockMatrix, which has the form [A | B].

Till Rohrmann
  • 13,148
  • 1
  • 25
  • 51
  • Right! so I need to `import org.apache.spark.rdd.RDD` first and define the type of blocks. I was just using the `sc.parallelize` and it didn't work. Thank you. – Ehsan M. Kermani Jul 27 '15 at 16:02
  • Do you know if there's a way to get a particular block ( block i,j) from blockMatrix above? – Ehsan M. Kermani Jul 27 '15 at 17:38
  • 1
    You can do this by accessing the underlying blocks `RDD`. E.g. `blockMatrix.blocks.filter{case ((x, y), matrix) => x == i && y == j}` will give you an `RDD[((Int, Int), Matrix)]` with only the selected block in it. – Till Rohrmann Jul 28 '15 at 07:23