8

Im using Spark 2.0. I have a column of my dataframe containing a WrappedArray of WrappedArrays of Float.

An example of a row would be:

[[1.0 2.0 2.0][6.0 5.0 2.0][4.0 2.0 3.0]]

Im trying to transform this column into an Array[Array[Float]].

What I tried so far is the following:

dataframe.select("mycolumn").rdd.map(r => r.asInstanceOf[Array[Array[Float]]])

but I get the following error:

Caused by: java.lang.ClassCastException:
 org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema cannot be cast to [[F

Any idea would be highly appreciated. Thanks

Sai Neelakantam
  • 919
  • 8
  • 15
bobo32
  • 992
  • 2
  • 9
  • 21

2 Answers2

4

Try this:

  val wawa: WrappedArray[WrappedArray[Float]] = null 
  val res: Array[Array[Float]] = wawa.map(inner => inner.array).toArray

It compiles for me

Sami Badawi
  • 977
  • 1
  • 10
  • 22
  • Thanks Sami, your answer led me to the final resolution. I'll update my own answer with the exact code for those like me who started with a dataframe. – bobo32 Jan 30 '17 at 19:04
  • My first question at Stackoverflow got 3 answers none of them worked, but by combing them I found a solution. :D – Sami Badawi Jan 30 '17 at 19:30
3

Following @sami-badawi 's answer I am posting the answer for those like me who started from a dataframe.

dataframe.select("mycolumn").rdd.map
(row => row.get(0).asInstanceOf[WrappedArray[WrappedArray[Float]]].array.map(x=>x.toArray))
bobo32
  • 992
  • 2
  • 9
  • 21
  • I tried to print out the values by below: rdd.map(row => row.get(0).asInstanceOf[WrappedArray[WrappedArray[String]]].toSeq.map(x=>x.toSeq.foreach(println))) – Burt Nov 20 '17 at 13:30