1

I am trying to perform Scala operation on Shark. I am creating an RDD as follows:

val tmp: shark.api.TableRDD = sc.sql2rdd("select duration from test")

I need it to convert it to RDD[Array[Double]]. I tried toArray, but it doesn't seem to work.

I also tried converting it to Array[String] and then converting using map as follows:

val tmp_2 = tmp.map(row => row.getString(0))
val tmp_3 = tmp_2.map { row => 
  val features = Array[Double] (row(0))
}

But this gives me a Spark's RDD[Unit] which cannot be used in the function. Is there any other way to proceed with this type conversion?

Edit I also tried using toDouble, but this gives me an RDD[Double] type, not RDD[Array[Double]]

val tmp_5 = tmp_2.map(_.toDouble)

Edit 2:

I managed to do this as follows:

A sample of the data:

296.98567000000003
230.84362999999999
212.89751000000001
914.02404000000001
305.55383

A Spark Table RDD was created first.

val tmp = sc.sql2rdd("select duration from test")

I made use of getString to translate it to a RDD[String] and then converted it to an RDD[Array[Double]].

val duration = tmp.map(row => Array[Double](row.getString(0).toDouble))
visakh
  • 2,503
  • 8
  • 29
  • 55
  • 1
    I assume your query `select duration from test` is returning only the `duration` column. Do you want to convert each single entry into an array or do you want to convert the result set into an array? In the second case, why would you still want that single array as an RDD? – maasg Jun 14 '14 at 11:05
  • @maasg I would like to convert each entry from the result (i.e.duration column values) into an array. I'm using an existing function and it needs the input datatype to be RDD[Array[Double]] – visakh Jun 16 '14 at 06:40
  • adding some sample data would help, I think. – maasg Jun 16 '14 at 08:23
  • @maasg I have edited the question to add an approach which works for now. – visakh Jun 16 '14 at 11:37

0 Answers0