0

I am using py4j to communicate from Java to Python. I packed the Java code below into a Jar and I am running in with the command java -jar file.jar BUT looking into it I can see this command runs about 30 times in separate threads although I called it once. I suspect it happens because the way py4j is implemented. How can I set a maximum on the number of threads py4j uses? what communication between Java and python can I use that will be lower on memory?

public static void main(String[] args) {
    final GroupTerms groupTerms = new GroupTerms();
    new GatewayServer(groupTerms).start();
  }
Yael Green
  • 71
  • 7

1 Answers1

0

The question is light on details. Here are the two main factors that influence the number of threads created by Py4J:

  1. Number of concurrent requests. Py4J creates as many thread as needed on the server side, but tries to reuse them when possible. For example, if you have 10 Java threads calling Python at the same time, Py4J might end up creating 10 threads on the Python side. The same is true for the other direction.
  2. Recursion depth between Java and Python. The regular threading model used by Py4J will create a new thread for each recursion level between Java and Python (e.g., Java calls Python that calls Java that calls Python). You can use the pinned-thread model to ensure that only one thread is created.

More details about the interactions between your Java and Python code would definitively help troubleshoot what is going on.

Barthelemy
  • 8,277
  • 6
  • 33
  • 36
  • 1
    1. all the ~30 threads are created when I start the Java GatewayServer and before ANY request is send. 2. Recursion depth between Java and Python is one. Python sends a request to Java and receives a response. – Yael Green Apr 26 '17 at 08:53
  • Py4J only creates a thread for the server (this is optional) and when receiving requests. If you just start a GatewayServer and you do not receive any request, I don't see how Py4J could end up creating 30 threads. Please open a bug report with a reproducible example and I'll look into it. Are you sure that GatewayServer is the one creating the threads, e.g., if you do not create a GroupTerms instance and do not start a Python interpreter, does it still create 30 threads? – Barthelemy Apr 26 '17 at 09:46