1

I am kind of new to scala and spark. Now I would like to generate a vector on each of the worker. When I use this line, I got two errors:

val b = sc.parallelize(1 to n, n).map( i => DenseVector[Double]](10.0,20.0,30.0,40.0))
  1. No ClassTag available for Vec[Double]
  2. not enough arguments for method map: (implicit evidence$3: scala.reflect.ClassTag[Vec[Double]])org.apache.spark.rdd.RDD[Vec[Double]]. Unspecified value parameter evidence$3.

Could anybody help me on this?

Ben Reich
  • 16,222
  • 2
  • 38
  • 59
pc27149
  • 21
  • 3
  • I believe there are lots of Syntax errors in that one line. Fix them and then may be we can help. – sarveshseri Feb 13 '15 at 19:36
  • 1
    The full signature of parrallelize function is `def parallelize[T](seq: Seq[T], numSlices: Int = defaultParallelism)(implicit arg0: ClassTag[T]): RDD[T] ` ... See that implicit parameter. You need to create an implicit instance of ClassTag for respective Type. – sarveshseri Feb 13 '15 at 19:41
  • Thanks. I think I am confused by the generic vector in scala and the vector in breeze package. It should be written in this way: val b = sc.parallelize(1 to 4, 4).map( i => (10,20,30,40).asInstanceOf[DenseVector[Double]]) – pc27149 Feb 14 '15 at 01:14

1 Answers1

0

Following works for me

scala> import breeze.linalg._
scala> val n = 2
n: Int = 2
scala> val b = sc.parallelize(1 to n, n).map( i => DenseVector[Double](10.0,20.0,30.0,40.0))
b: org.apache.spark.rdd.RDD[breeze.linalg.DenseVector[Double]] = MapPartitionsRDD[1] at map at <console>:24

Here is some data in the rdd:

   scala> b.take(1)
res1: Array[breeze.linalg.DenseVector[Double]] = Array(DenseVector(10.0, 20.0, 30.0, 40.0))
WestCoastProjects
  • 58,982
  • 91
  • 316
  • 560