0

I have a Scala case object defined like so:

object DurationUnitsOfMeasure {
  sealed abstract class DurationUnitOfMeasure(val name : String)
  {
    override def toString : String = name
    lazy val initial: Char = name.charAt(2).toLower
  }
  case object Day extends DurationUnitOfMeasure("__DAY__")
  case object Week extends DurationUnitOfMeasure("__WEEK__")
  case object Month extends DurationUnitOfMeasure("__MONTH__")

  val durationUnitsOfMeasure : Seq[DurationUnitOfMeasure] = Seq(Day, Week, Month)
}

This gets used by some code I'm writing to interact with Spark. I also want to interact with that code from Python which I've done successfully using Py4J however I'm now at the point where I want to instantiate instances of that case object from Python/PySpark and I can't figure out how to do it.

I found a useful reference at https://github.com/awslabs/deequ/issues/109#issuecomment-504220206 that taught me to use javap to find the class structure of DurationUnitsOfMeasure

$ javap -classpath ../target/scala-2.11/foo_2.11-0.1-SNAPSHOT.jar com/package/DurationUnitsOfMeasure
Compiled from "File.scala"
public final class com.package.DurationUnitsOfMeasure {
  public static scala.collection.Seq<com.package.DurationUnitsOfMeasure$DurationUnitOfMeasure> durationUnitsOfMeasure();
}

which in turn led me to writing this python code:

# self.spark is an instance of SparkSession
jDurationsUnitsOfMeasure = getattr(
            self.spark._sc._jvm.com.package.DurationUnitsOfMeasure,
            "durationUnitsOfMeasure")

jDurationsUnitsOfMeasure is a <py4j.java_gateway.JavaMember object at 0x7fc0dbb14850 which I can interrogate using the usual python methods such as dir():

(Pdb) dir(jDurationsUnitsOfMeasure)
['call', 'class', 'delattr', 'dict', 'dir', 'doc', 'eq', 'format', 'ge', 'getattribute', 'gt', 'hash', 'init', 'init_subclass', 'le', 'lt', 'module', 'ne', 'new', 'reduce', 'reduce_ex', 'repr', 'setattr', 'sizeof', 'str', 'subclasshook', 'weakref', '_build_args', '_gateway_doc', '_get_args', 'command_header', 'container', 'converters', 'gateway_client', 'name', 'pool', 'stream', 'target_id']

but I can't figure out how to do the thing I want to do which is to instantiate an instance of DurationUnitsOfMeasure.Day. I tried this:

jDurationsUnitsOfMeasureDay = getattr(
            self.spark._sc._jvm.com.package.DurationUnitsOfMeasure,
            "durationUnitsOfMeasure$Day")

but that just bombed out with error:

py4j.protocol.Py4JError: com.package.DurationUnitsOfMeasure.durationUnitsOfMeasure$Day does not exist in the JVM

I feel like I'm not far away from being able to instantiate DurationUnitsOfMeasure.Day from Python, but I haven't solved it yet. Any advice would be much appreciated.

jamiet
  • 10,501
  • 14
  • 80
  • 159

1 Answers1

0

Turns out I was over-complicating it. This works:

jDurationUnitsOfMeasure = self.spark._sc._jvm.scala.collection.JavaConversions.seqAsJavaList(
 self.spark._sc._jvm.com.package.DurationUnitsOfMeasure.durationUnitsOfMeasure())

That returns a py4j.java_collections.JavaList which exists so that this can be treated as a good ol' Python list therefore I can manipulate it as I would any other Python list (I prefer list comprehensions).

jamiet
  • 10,501
  • 14
  • 80
  • 159