I have a class in java that builds some sophisticated Spark DataFrame.
package companyX;
class DFBuilder {
public DataFrame build() {
...
return dataframe;
}
}
I add this class to the pyspark/jupiter classpath so its callable by py4j. Now when I call it I get strange type:
b = sc._jvm.companyX.DFBuilder()
print(type(b.build()))
#prints: py4j.java_gateway.JavaObject
VS
print(type(sc.parallelize([]).toDF()))
#prints: pyspark.sql.dataframe.DataFrame
Is there a way to convert this JavaObject into proper pyspark dataframe? One of the problems I have is that when I want to call df.show() on a DataFrame build in java is that it gets printed in spark logs, and not in notebook cell.