I have a java class registered in PySpark, and Im trying to pass a Broadcast variable from PySpark to a method in this class. Like so:
from py4j.java_gateway import java_import
java_import(spark.sparkContext._jvm, "net.a.b.c.MyClass")
myPythonGateway = spark.sparkContext._jvm.MyClass()
with open("tests/fixtures/file.txt", "rb") as binary_file:
data = spark.sparkContext.broadcast(binary_file.read())
myPythonGateway.setData(data)
But this is throwing:
AttributeError: 'Broadcast' object has no attribute '_get_object_id'
However, if I pass the byte[] directly, without wrapping it in broadcast(), it works fine. But I need this variable to be broadcast, as it will be used repeatedly.