I am trying to convert a Dataframe
to a Dataset
, and the java classes structure is as follows:
class A:
public class A {
private int a;
public int getA() {
return a;
}
public void setA(int a) {
this.a = a;
}
}
class B:
public class B extends A {
private int b;
public int getB() {
return b;
}
public void setB(int b) {
this.b = b;
}
}
and class C
public class C {
private A a;
public A getA() {
return a;
}
public void setA(A a) {
this.a = a;
}
}
and the data in the dataframe is as follows :
+-----+
| a |
+-----+
|[1,2]|
+-----+
When I am trying to apply Encoders.bean[C](classOf[C]) to the dataframe. The object reference A
which is a instance of B
in class C
is not returning true when I am checking for .isInstanceOf[B], I am getting it as false. The output of Dataset is as follows:
+-----+
| a |
+-----+
|[1,2]|
+-----+
How do we get all the fields of A and B under the C object while iterating over it in foreach?
Code :-
object TestApp extends App {
implicit val sparkSession = SparkSession.builder()
.appName("Test-App")
.config("spark.sql.codegen.wholeStage", value = false)
.master("local[1]")
.getOrCreate()
var schema = new StructType().
add("a", new ArrayType(new StructType().add("a", IntegerType, true).add("b", IntegerType, true), true))
var dd = sparkSession.read.schema(schema).json("Test.txt")
var ff = dd.as(Encoders.bean[C](classOf[C]))
ff.show(truncate = false)
ff.foreach(f => {
println(f.getA.get(0).isInstanceOf[A])//---true
println(f.getA.get(0).isInstanceOf[B])//---false
})
Content of File : {"a":[{"a":1,"b":2}]}