3

I'm currently trying to do the following using Py4J:

  • Define a method ("executor") in Python which calls JVM methods
  • Define a Python ("callback") object implementing a JVM interface
  • Construct a JVM object given this callback object
  • Call a method on this object which will spawn a new thread in Java, call the callback on the callback object, which will (on the Python side) execute the "executor" method

Here's what I have on the Java side of things:

package javabridge.test;
public interface PythonCallback {
    Object notify(Object source);
}


package javabridge.test;
public class ScheduledRunnable implements Runnable {
    private PythonCallback callback;
    public ScheduledRunnable(PythonCallback callback) {
        this.callback = callback;
    }
    @Override
    public void run() {
        System.out.println("[ScheduledRunnable] run -> notify");
        this.callback.notify(this);
    }
}


package javabridge.test;
import py4j.GatewayServer;
public class Test {
    private PythonCallback callback;
    public Test(PythonCallback callback) {
        this.callback = callback;
    }
    public void runSynchronous() {
        System.out.println("[runSynchronous] run -> notify");
        this.callback.notify(this);
    }
    public void runAsynchronous() {
        System.out.println("[runAsynchronous] run -> spawn thread");
        ScheduledRunnable runnable = new ScheduledRunnable(callback);
        Thread t = new Thread(runnable);
        t.start();
    }
    public static void main(String[] args) {
        GatewayServer server = new GatewayServer();
        server.start(true);
    }   
}

On the Python side, I have the following script:

from py4j.java_gateway import JavaGateway, java_import, get_field, CallbackServerParameters
from py4j.clientserver import ClientServer, JavaParameters, PythonParameters

gateway = JavaGateway(callback_server_parameters=CallbackServerParameters())
#gateway = ClientServer(java_parameters=JavaParameters(), python_parameters=PythonParameters())

java_import(gateway.jvm, 'javabridge.test.*')

class PythonCallbackImpl(object):
    def __init__(self, execfunc):
        self.execfunc = execfunc
    def notify(self, obj):
        print('[PythonCallbackImpl] notified from Java')
        self.execfunc()
        return 'dummy return value'
    class Java:
        implements = ["javabridge.test.PythonCallback"]

def simple_fun():
    print('[simple_fun] called')
    gateway.jvm.System.out.println("[simple_fun] Hello from python!")

# Test 1: Without threading
input('Ready to begin test 1')
python_callback = PythonCallbackImpl(simple_fun)
nothread_executor = gateway.jvm.Test(python_callback)
nothread_executor.runSynchronous()

# Test 2: With threading
input('Ready to begin test 2')
python_callback = PythonCallbackImpl(simple_fun)
nothread_executor = gateway.jvm.Test(python_callback)
nothread_executor.runAsynchronous()

gateway.shutdown()

Here's what happens when trying to execute this script. First, using gateway = ClientServer(java_parameters=JavaParameters(), python_parameters=PythonParameters()), both tests fail:

Test 1:

py4j.protocol.Py4JJavaError: An error occurred while calling o0.runSynchronous.
: py4j.Py4JException: Command Part is Empty or is the End of Command Part
        at py4j.Protocol.getObject(Protocol.java:277)
        at py4j.Protocol.getReturnValue(Protocol.java:458)

Test 2:

Exception in thread "Thread-4" py4j.Py4JException: Error while obtaining a new communication channel
        at py4j.CallbackClient.getConnectionLock(CallbackClient.java:218)
        at py4j.CallbackClient.sendCommand(CallbackClient.java:337)
        at py4j.CallbackClient.sendCommand(CallbackClient.java:316)

However, if I comment out the self.execfunc() line, test 1 does work without raising errors. Test 2 still fails however:

Exception in thread "Thread-5" py4j.Py4JException: Error while sending a command.
        at py4j.CallbackClient.sendCommand(CallbackClient.java:357)
        at py4j.CallbackClient.sendCommand(CallbackClient.java:316)

Now switching to gateway = JavaGateway(callback_server_parameters=CallbackServerParameters()). When I keep self.execfunc() commented out, test 2 still fails here:

Exception in thread "Thread-5" py4j.Py4JException: Error while sending a command.
        at py4j.CallbackClient.sendCommand(CallbackClient.java:357)
        at py4j.CallbackClient.sendCommand(CallbackClient.java:316)

But at least test 1 does work with self.execfunc() enabled.

My question is: how can I use the threaded approach with the self.execfunc() call? Is this possible with Py4J?

Edit: and to make things even more tricky, Java commands called by self.execfunc() should run in the same Java thread that invoked .notify().

Macuyiko
  • 85
  • 1
  • 10

1 Answers1

2

Solved. Turns out to be very simple:

  1. Use ClientServer on the Python side and on the Java side as well!
  2. Don't call gateway.shutdown() as this will disconnect Python before the callback can be received (duh!)

Java will then neatly adhere to the expected thread model, i.e. Java commands called by the receiving Python callback are executed in the same Java thread that performed the callback.

Through a simple Python function, a shutdown_when_done method can be added which waits until all callbacks have come back before quitting.

Macuyiko
  • 85
  • 1
  • 10
  • by any chance, were you using this to approach to connect to Apache Spark executors? (pySpark) – Tagar Apr 04 '18 at 22:37
  • No, though the majority of posts and questions do pop up in the context of pySpark, which seems to be the main user of Py4J. – Macuyiko Apr 06 '18 at 14:24