0

My understanding: If I have a model class that extends a second model class, I shouldn't be able to access the private members of the parent class in the child class (unless I use reflection).

Extending this, I expect that when a Spark dataframe is encoded as a dataset of the child model class, it shouldn't have columns that include private members of the parent model class. (But this is not what I observe.)

More concretely, my parent class:

public class Foo {
    private int one;
    protected String two;
    protected double three;
}

The child class:

public class Bar extends Foo {
    private int four;
    protected String five;
}

I have a couple of Bar objects that I use to create a Spark dataframe i.e., Dataset<Row> like so:

Dataset<Row> barDF = session.createDataframe(barList, Bar.class);

When, at a later point, I want to encode this as a dataset,

Dataset<Bar> barDS = barDF.as(Encoders.bean(Bar.class));

I expect barDS to have four columns (excluding one, the private member of Foo). But the result of barDS.show() is instead:

+------+------+-----+-------+-----+
| five | four | one | three | two |
+------+------+-----+-------+-----+
| 9    | 9    | 0   | 3.0   | 3   |
| 16   | 16   | 0   | 4.0   | 4   |
+------+------+-----+-------+-----+

What am I missing in expecting one not to be present in the dataset? Also, what encoding can I use instead of bean encoding so that Java's rules of inheritance are obeyed?

  • I don't know `Spark`, but it's pretty obvious that it must be using reflection to access it. As a matter of fact, not only private variables of the parent are accessed by it, also of itself. `Spark` also can't access `four` that's in the child, without using reflection – Ivo May 19 '22 at 05:57
  • There are public getters and setters as well. I should've mentioned that; but Spark encoding fails without them. So it inherits the public getters and setters of the parent class. I guess that clarifies my problem. :) Thank you for your response. – Polyphonic Mobius May 19 '22 at 09:57
  • I still don't know how to solve my problem though. If I declare the getter and setter for `one` as `protected`, my encoding of `Bar` works as expected but for `Foo` it fails. If I leave them `public`, I get unexpected behavior for `Bar`. How would I solve that? – Polyphonic Mobius May 19 '22 at 10:12
  • Sorry, can't help you there. I'm not familiar with how it works – Ivo May 19 '22 at 10:54

0 Answers0