Given a simple Scala case class like this:
package com.foo.storage.schema
case class Person(name: String, age: Int)
it's possible to create a Spark schema from a case class as follows:
import org.apache.spark.sql._
import com.foo.storage.schema.Person
val schema = Encoders.product[Person].schema
I wonder if it's possible to access the schema from a case class in Python/PySpark. I would hope to do something like this [Python]:
jvm = sc._jvm
py4j_class = jvm.com.foo.storage.schema.Person
jvm.org.apache.spark.sql.Encoders.product(py4j_class)
This throws an error com.foo.storage.schema.Person._get_object_id does not exist in the JVM
. The Encoders.product
is a generic in Scala, and I'm not entirely sure how to specify the type using Py4J. Is there a way to use the case class to create a PySpark schema?