Using Spark, I have a data structure of type val rdd = RDD[(x: Int, y:Int), cov:Double]
in Scala, where each element of the RDD represents an element of a matrix with x
representing the row, y
representing the column and cov
representing the value of the element:
I need to create SparseVectors from rows of this matrix. So I decided to first convert the rdd to RDD[x: Int, (y:Int, cov:Double)]
and then use groupByKey to put all elements of a specific row together like this:
val rdd2 = rdd.map{case ((x,y),cov) => (x, (y, cov))}.groupByKey()
Now I need to create the SparseVectors:
val N = 7 //Vector Size
val spvec = {(x: Int,y: Iterable[(Int, Double)]) => new SparseVector(N.toLong, Array(y.map(el => el._1.toInt)), Array(y.map(el => el._2.toDouble)))}
val vecs = rdd2.map(spvec)
However, this is the error that pops up.
type mismatch; found :Iterable[Int] required:Int
type mismatch; found :Iterable[Double] required:Double
I am guessing that y.map(el => el._1.toInt)
is returning an iterable which Array cannot be applied on. I would appreciate if someone could help with how to do this.