3

I am a beginer on Spark/Scala. I would like to extract a value(Double) in the Array selected from Dataset. Simplified major steps are shown below. How can I extract each value[Double] in the last val wpA ? Something like val p1 = wpA(1). I failed to conver it to normal array by wpA.toArray.

Thank you in advance for your help.

case class Event(eventId: Int, n_track: Int, px:ArrayBuffer[Double],py: ArrayBuffer[Double], pz: ArrayBuffer[Double],ch: ArrayBuffer[Int], en: ArrayBuffer[Double])
---
val rawRdd =  sc.textFile("expdata/rawdata.bel").map(_.split("\n"))
val eventRdd = rawRdd.map(x => buildEvent(x(0).toString))
val dataset = sqlContext.createDataset[Event](eventRdd) 
dataset.printSchema()
    root
      |-- eventId: integer (nullable = false)
      |-- n_track: integer (nullable = false)
      |-- px: array (nullable = true)
      |    |-- element: double (containsNull = false)
      |-- py: array (nullable = true)
      |    |-- element: double (containsNull = false)
      |-- pz: array (nullable = true)
      |    |-- element: double (containsNull = false)
      |-- ch: array (nullable = true)
      |    |-- element: integer (containsNull = false)
      |-- en: array (nullable = true)
      |    |-- element: double (containsNull = false)

val dataFrame  = dataset.select("px")     
val dataRow =  dataFrame.collect()      
val wpA = dataRow(1)(0)  
println(wpA)
      WrappedArray(-0.99205, 0.379417, 0.448819,.....)
W. Yoshyk
  • 31
  • 1
  • 3

1 Answers1

8

When you write:

val wpA = dataRow(1)(0)  

You get a variable of type Any, because org.apache.spark.sql.Row.apply(Int) (which is the method called here on the result of datarow(1)), returns Any.

Since you know the expected type of the first item (index = 0) of this row, you should use Row.getAs[T](Int) and indicate that you expect a WrappedArray. Then, compiler will know that wpA is an array and you'll be able to use any of its methods (including the apply method that takes an int and can be called using parens only):

import scala.collection.mutable

val wpA = dataRow(1).getAs[mutable.WrappedArray[Double]](0)
println(wpA) // WrappedArray(-0.99205, 0.379417, 0.448819,.....)
println(wpA(0)) // -0.99205
Tzach Zohar
  • 37,442
  • 3
  • 79
  • 85