1

I have an MLLIB distributed row matrix in which row order doesn't matter. Is there any way to easily convert this into a breeze dense matrix? I'd imagine a row-by-row mapping might work, but I'm relatively unfamiliar with breeze as a whole.

Edit: Using X.rows.map(x => x.toArray), I've managed to convert it into an RDD of the form org.apache.spark.rdd.RDD[Array[Double]]. I believe this is a step in the right direction...

mongolol
  • 941
  • 1
  • 13
  • 31
  • Did you try doing a `collect` of the RDD and then converting it into a breeze matrix? – ar7 Oct 12 '16 at 15:41
  • Hmm, using collect on the RDD, I end up with: `breeze.linalg.DenseMatrix[Array[Double]]`. What I need is `breeze.linalg.DenseMatrix[Double]`. Thanks for the suggestion, however. I believe I just need to convert the array into a a vector. – mongolol Oct 12 '16 at 15:52

2 Answers2

5

Do a collect on your RDD. It'll return you an Array[Array[Double]].

val array = your_rdd.collect()

One to convert the array of arrays into a matrix would be to do the following:

val dm = DenseMatrix(array.map(_.toArray):_*)

Part of the answer was taken from here. Hope this solves the problem.

Community
  • 1
  • 1
ar7
  • 510
  • 6
  • 20
2

Ended up getting it working with the below code.

import breeze.linalg.{DenseVector => BDV, DenseMatrix => BDM, sum}
val arr = X.rows.map(x => x.toArray).collect.flatten
val dm = new BDM(X.numRows().toInt, X.numCols().toInt, arr)

Thanks, @ar7 for hte help.

mongolol
  • 941
  • 1
  • 13
  • 31