0

I am running a Flink job in standalone deployment mode that uses Java djl to load a pytorch model. The model gets successfully loaded and I am able to cancel the job through Flink Rest API. However, when I try to launch the flink job once again, it throws,

UnsatisfiedLink Error:<pytorch>.so already loaded in another classloader

It requires a standalone deployment restart to load again. Is it possible to close the process along with the close job request so that I can load again without restarting?

Daniel Widdis
  • 8,424
  • 13
  • 41
  • 63
Vim
  • 71
  • 3

1 Answers1

0

The native library can only be loaded once per JVM. In DJL, the pytorch native library will be loaded when Engine class is initialized, if the native library has been loaded already in another classloader, the engine class will failed to initialize.

One of the workaround is to load the native library in system ClassLoader that can be shared by child classloaders. DJL allows you to inject a NativeHelper class to load the native library, you need to make sure your NativeHelper is in the system classpath:

System.setProperty("ai.djl.pytorch.native_helper", "org.examples.MyNativeHelper");

You can find the test code for NativeHelper here

See this link for more detail

In your MyNativeHelper class, you only need to add the following:

    public static void load(String path) {
        System.load(path);
    }

At runtime DJL will invoke your load(String path) function to load native library in your ClassLoader.

Frank Liu
  • 281
  • 1
  • 4
  • What does path variable refers to? `public static void load(String path) { System.load(path); // NOPMD }` – Vim Jan 14 '22 at 14:02
  • `path` is the file path of `libtorch.so` file. DJL will located the pytorch native library file path and pass to `MyNativeHelper.load(String path)` function. – Frank Liu Jan 15 '22 at 16:35
  • I have tried the following steps: 1) Create NativeHelper class and exported/set the NativeHelper class in the CLASSPATH variable 2) Used the Junit Test as specified in the example and unit test failed with _ClassNotFoundException_. My execution enviroment is Ubuntu 20.04. Do I miss any additional step? – Vim Jan 18 '22 at 11:31
  • Can you provide stacktrace? – Frank Liu Jan 19 '22 at 21:35
  • The problem was due to the MyNativeHelper not available in classpath. However, I am getting the same Classloader exception, even though MyNativeHelper is called. MyNativeHelper is packaged as jar file and before calling Pytorch model Inference I set the System property. – Vim Feb 15 '22 at 14:48
  • I created a jar file which contains the MyNativeHelper class and placed under /lib folder. Based on the following example [link](https://docs.djl.ai/docs/demos/apache-beam/ctr-prediction/index.html), I tried to run the example based on Flink runner with multiple instance. – Vim Feb 15 '22 at 18:53