I have a Scala case object defined like so:
object DurationUnitsOfMeasure {
sealed abstract class DurationUnitOfMeasure(val name : String)
{
override def toString : String = name
lazy val initial: Char = name.charAt(2).toLower
}
case object Day extends DurationUnitOfMeasure("__DAY__")
case object Week extends DurationUnitOfMeasure("__WEEK__")
case object Month extends DurationUnitOfMeasure("__MONTH__")
val durationUnitsOfMeasure : Seq[DurationUnitOfMeasure] = Seq(Day, Week, Month)
}
This gets used by some code I'm writing to interact with Spark. I also want to interact with that code from Python which I've done successfully using Py4J however I'm now at the point where I want to instantiate instances of that case object from Python/PySpark and I can't figure out how to do it.
I found a useful reference at https://github.com/awslabs/deequ/issues/109#issuecomment-504220206 that taught me to use javap
to find the class structure of DurationUnitsOfMeasure
$ javap -classpath ../target/scala-2.11/foo_2.11-0.1-SNAPSHOT.jar com/package/DurationUnitsOfMeasure
Compiled from "File.scala"
public final class com.package.DurationUnitsOfMeasure {
public static scala.collection.Seq<com.package.DurationUnitsOfMeasure$DurationUnitOfMeasure> durationUnitsOfMeasure();
}
which in turn led me to writing this python code:
# self.spark is an instance of SparkSession
jDurationsUnitsOfMeasure = getattr(
self.spark._sc._jvm.com.package.DurationUnitsOfMeasure,
"durationUnitsOfMeasure")
jDurationsUnitsOfMeasure
is a <py4j.java_gateway.JavaMember object at 0x7fc0dbb14850
which I can interrogate using the usual python methods such as dir()
:
(Pdb) dir(jDurationsUnitsOfMeasure)
['call', 'class', 'delattr', 'dict', 'dir', 'doc', 'eq', 'format', 'ge', 'getattribute', 'gt', 'hash', 'init', 'init_subclass', 'le', 'lt', 'module', 'ne', 'new', 'reduce', 'reduce_ex', 'repr', 'setattr', 'sizeof', 'str', 'subclasshook', 'weakref', '_build_args', '_gateway_doc', '_get_args', 'command_header', 'container', 'converters', 'gateway_client', 'name', 'pool', 'stream', 'target_id']
but I can't figure out how to do the thing I want to do which is to instantiate an instance of DurationUnitsOfMeasure.Day
. I tried this:
jDurationsUnitsOfMeasureDay = getattr(
self.spark._sc._jvm.com.package.DurationUnitsOfMeasure,
"durationUnitsOfMeasure$Day")
but that just bombed out with error:
py4j.protocol.Py4JError: com.package.DurationUnitsOfMeasure.durationUnitsOfMeasure$Day does not exist in the JVM
I feel like I'm not far away from being able to instantiate DurationUnitsOfMeasure.Day
from Python, but I haven't solved it yet. Any advice would be much appreciated.