1

I ma using some code to get adouble array from an item in dataframe

val ratio1=hiveContext.sql("SELECT percentile_approx(ts, array (0.5,0.7)) from df")
val trainingPoint=ratio1.collect()(0).getAs[Array[Double]](0)(0)
val validationPoint=ratio1.collect()(0).getAs[Array[Double]](0)(1)

System.out.print("The training set from hive is :")
ratio1.show(false)

the dataframe is as follow:

The training set from hive is :+----------+
|_c0       |
+----------+
|[5.0, 7.0]|
+----------+

so I need to get the two double points

but I get below error

17/05/08 09:10:32 ERROR ApplicationMaster: User class threw exception: java.lang.ClassCastException: scala.collection.mutable.WrappedArray$ofRef cannot be cast to [D
java.lang.ClassCastException: scala.collection.mutable.WrappedArray$ofRef cannot be cast to [D
        at testCase$.main(testCase.scala:40)
        at testCase.main(testCase.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:497)
        at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:559)

I want to know how to get the two double points from above dataframe?

Luckylukee
  • 575
  • 2
  • 9
  • 27

2 Answers2

1

In a Row the ArrayType is represented as scala.collection.mutable.WrappedArray, so to access those value you need to use either Seq or Wrappedarray as below.

//Using Seq
    val trainingPoint=ratio1.collect()(0).getAs[Seq[Double]](0)(0)
    val validationPoint=ratio1.collect()(0).getAs[Seq[Double]](0)(1)
//Using Wrapped Array 
    val trainingPoint=ratio1.collect()(0).getAs[mutable.WrappedArray[Double]](0)(0)
    val validationPoint=ratio1.collect()(0).getAs[mutable.WrappedArray[Double]](0)(1)

Below is the simple example to test

class TestWrappedArray extends FunSuite with BeforeAndAfterEach{

  val spark = SparkSession.builder().master("local").getOrCreate()
  test ("test wrapped array ") {
    import spark.implicits._
    val data = spark.sparkContext.parallelize(
      Seq(("a", List(1.5,5.2)), ("b", List(2.3,4.2))
    )).toDF("id", "point")

    data.collect()(0).getAs[Seq[Double]]("point").foreach(println)


  }
}
koiralo
  • 22,594
  • 6
  • 51
  • 72
0
var trainingPoint=ratio1.collect()(0).getList[Double](0).get(0)
var validationPoint=ratio1.collect()(0).getList[Double](0).get(1)
Luckylukee
  • 575
  • 2
  • 9
  • 27