1

I am struggling with some very basic spark code. I would like to define a matrix x with 2 columns. This is what I have tried:

scala> val s = breeze.linalg.linspace(-3,3,5)
s: breeze.linalg.DenseVector[Double] = DenseVector(-3.0, -1.5, 0.0, 1.5, 3.0) // in this case I want s to be both column 1 and column 2 of x

scala> val ss = s.toArray ++ s.toArray
ss: Array[Double] = Array(-3.0, -1.5, 0.0, 1.5, 3.0, -3.0, -1.5, 0.0, 1.5, 3.0)

scala> import org.apache.spark.mllib.linalg.distributed.RowMatrix
import org.apache.spark.mllib.linalg.distributed.RowMatrix

scala> val mat = new RowMatrix(ss, 5, 2)
<console>:17: error: type mismatch;
 found   : Array[Double]
 required: org.apache.spark.rdd.RDD[org.apache.spark.mllib.linalg.Vector]
       val mat = new RowMatrix(ss, 5, 2)

I do not understand how I can get the right transformation in order to pass the values to the distributed matrix ^

EDIT: Maybe I have been able to solve:

scala> val s = breeze.linalg.linspace(-3,3,5)
s: breeze.linalg.DenseVector[Double] = DenseVector(-3.0, -1.5, 0.0, 1.5, 3.0)

scala> val ss = s.to
toArray         toDenseMatrix   toDenseVector   toScalaVector   toString        
toVector        

scala> val ss = s.toArray ++ s.toArray
ss: Array[Double] = Array(-3.0, -1.5, 0.0, 1.5, 3.0, -3.0, -1.5, 0.0, 1.5, 3.0)

scala> val x = new breeze.linalg.Dense
DenseMatrix   DenseVector   

scala> val x = new breeze.linalg.DenseMatrix(5, 2, ss)
x: breeze.linalg.DenseMatrix[Double] = 
-3.0  -3.0  
-1.5  -1.5  
0.0   0.0   
1.5   1.5   
3.0   3.0   

scala> val xDist = sc.parallelize(x.toArray)
xDist: org.apache.spark.rdd.RDD[Double] = ParallelCollectionRDD[0] at parallelize at <console>:18
Donbeo
  • 17,067
  • 37
  • 114
  • 188
  • 1
    `makeRDD` (method of `SparkContext`) will make an `RDD` from a collection, so probably you want `sc.makeRDD(ss)` as your first arg to `RowMatrix`? – The Archetypal Paul Feb 05 '15 at 18:00
  • It's also referring to an MLlib type, you may want to look at the [vector constructor](http://spark.apache.org/docs/1.2.0/mllib-data-types.html) examples on that page. – Rich Henry Feb 05 '15 at 21:46
  • @RichHenry I was looking to that example but I do not understand how can I construct a vector if I need something like a linspace – Donbeo Feb 06 '15 at 00:33
  • @Paul yout solution does not work `scala> val mat = new RowMatrix(sc.makeRDD(ss), 5, 2) :17: error: type mismatch; found : Array[Double] required: Seq[org.apache.spark.mllib.linalg.Vector] val mat = new RowMatrix(sc.makeRDD(ss), 5, 2)` – Donbeo Feb 06 '15 at 00:47
  • Your code seems a bit confused. RowMatrix wants a RDD containing the rows of the matrix, with each Row a Vector. So, five rows each wilth 2 columns, in your example. You're passing it a single array of 10 doubles. And in `s` you seem to be constructing a Vector of two columns, not five – The Archetypal Paul Feb 06 '15 at 07:40
  • er, 5 columns and not 2. – The Archetypal Paul Feb 06 '15 at 08:00
  • But how can I make the matrix with two columns each one of them is s? – Donbeo Feb 06 '15 at 08:56
  • Something like `val c = Array(-3.0, -1.5, 0.0, 1.5, 3.0) ; val t = Array(c,c).transpose.map(r=>new DenseVector(r)); val rdd = sc.makeRDD(t)`? (not tested, I don't have the linalg stuff installed) – The Archetypal Paul Feb 06 '15 at 09:06
  • It's not working. I am afraid that I have to wait for better answers.. – Donbeo Feb 06 '15 at 10:06
  • What's "not working" about it? When I get a min I'll see if I can install the linalg stuff. Bit that code does produce a 2 column 5 row RDd, I think. – The Archetypal Paul Feb 06 '15 at 10:19

1 Answers1

0

Something like this. This typechecks, but for some reason won't run in my Scala worksheet.

import org.apache.spark.mllib.linalg._
import org.apache.spark.mllib.linalg.distributed._
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf
import org.apache.spark.rdd.RDD

val conf = new SparkConf().setAppName("spark-scratch").setMaster("local")
val sc= new SparkContext(conf)

// the values for the column in each row
val col = List(-3.0, -1.5, 0.0, 1.5, 3.0) ;

// make two rows of the column values, transpose it,
// make Vectors of the result
val t = List(col,col).transpose.map(r=>Vectors.dense(r.toArray))

// make an RDD from the resultant sequence of Vectors, and 
// make a RowMatrix from that.
val rm = new RowMatrix(sc.makeRDD(t));
The Archetypal Paul
  • 41,321
  • 20
  • 104
  • 134