Creating a DL4J java application

Question

How can I use DL4J in my own Java project using maven?

For this project I am going to need to shade dependencies inside of my new projects jar however doing this causes my jar file to be almost 500MB when its normally like 4MB it also causes my jar to crash (this is before I even do anything with the shaded dependencies) so I assume I am doing something wrong.

Here is what I added to my pom:

<properties>
    <java.version>1.8</java.version>
    <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
    <dl4j-master.version>1.0.0-M2</dl4j-master.version>
    <logback.version>1.2.3</logback.version>
    <maven-shade-plugin.version>2.4.3</maven-shade-plugin.version>
</properties>

<dependencies>
    <dependency>
        <groupId>org.deeplearning4j</groupId>
        <artifactId>deeplearning4j-core</artifactId>
        <version>${dl4j-master.version}</version>
    </dependency>

    <dependency>
        <groupId>org.nd4j</groupId>
        <artifactId>nd4j-native</artifactId>
        <version>${dl4j-master.version}</version>
    </dependency>
 </dependencies>

Then to resolve the maven errors I went into my project structure and added the following libraries. I then clicked add to project which created a libs folder and then downloaded a ton of files. This did resolve all errors.

org.apache.cassandra:cassandra-all:1.1.42
org.deeplearning4j:deeplearning4j-core:1.0.0-alpha2

Looking in my libs folder I know I can delete the android stuff but I assume I need to keep all of the windows, linux, and macos jars so that I can run my jar on multiple operating systems however there is a bunch more here that I am not sure about.

My objective with DL4J is to train a model to recognize a pattern and return an array of size 2 with one of the 9 permutations of 0, 1, and -1 for example:

new int[]{0, 1}, new int[] {1, 0}, new int[] {1, -1}

To train the model I can supply it with as much data as it needs (please let me know an estimated amount). This data will be of high quality meaning that it will contain a few integers and booleans along with being assign one of the 9 array permutations (all of this data will be accurate). From this I am hoping to be able to train the model to output all array permutations that a given set of data will satisfy ordered from most to least likely. Also how fast would a trained model be able to perform these calculations? Anyway I would greatly appreciate any insight into the required dependencies / structure needed to achieve my desired outcome.

Here are images of my libs folder in case you are curious.

(I am now shading it correctly so I don't have the libs folder)

However I still need to figure out how to use the pom.xml to not include all jars related to ios and android.

Are you OK with building platform-specific JARs? Like building one JAR for Linux, one for Windows and one for MacOS? — dan1st, Jun 01 '22 at 20:36
No because I am making a plugin jar so that is already handled. However if that is required for Dl4J I can look into detecting operating systems but I wouldn't know how to go from that to selecting the correct libs jar. — , Jun 01 '22 at 21:03
We can select by platform: https://github.com/bytedeco/javacpp-presets/wiki/Reducing-the-Number-of-Dependencies — Samuel Audet, Jun 01 '22 at 23:17
If you want to use a single fat JAR, you would have to include the native libraries (as you are doing) resulting in big files. — dan1st, Jun 02 '22 at 05:14
Aside from that, where did you get the `lib` folder from? Maven typically doesn't require one. — dan1st, Jun 02 '22 at 05:16
Idk it made me manually added the dependencies so I chose to use a libs folder so I could figure out why the file is so large. I am now shading it without the libs folder. — , Jun 02 '22 at 08:13

score 3 · Answer 1 · answered Jun 01 '22 at 23:22

Firstly just looking at your versions try to keep them up to date. alpha2 is years old at this point. Try to use M1.1 or M2.

Short answer: use -Djavacpp.platform=${YOUR_PLATFORM} that's just how things work for a java library that uses c++ code.

Long answer:

With the assumption you'll use a newer version I can give you some general advice: when building jars we build native dependencies for different operating systems. That means c/c++ code per OS. We do this for performance reasons. Most libraries with native code work like this.

Java is not capable of fast math code that also runs on gpus. It won't be for years to come. Even most current efforts are not going to be fast enough for many ML workloads.

Note that I know this is not normal in java but this is a common trade off for any java library with a native component. That includes something like netty which is used in cassandra which you're also using there.

The way we do this is via javacpp. Javacpp is what allows us to automatically generate bindings per platform to some underlying c++ code that powers the math routines we actually run when doing calculations.

When building an uber jar you can either just let the jar build with mvn clean package or you can specify -Djavacpp.platform= during the build.

That will allow you to only include the dependencies you want.

Note that this has a trade off of only running on certain platforms if you do that.

You can't have a multi platform jar without also accepting the trade off of the larger size. This isn't a problem for most people.

If you want to hand optimize a multi platform jar, you can also manually include the platforms you want by specifying the classifiers you want. You should be ready to understand how to use maven classifiers if you're ready to do so.

Regarding performance: I'm not sure what you're expecting but it should be fast enough for whatever you're hoping for. Performance is always going to be relative to the neural net you build (neural nets vary in size) and the size of the data you are dealing with.

Use our project template for the rest: https://github.com/eclipse/deeplearning4j-examples/tree/master/mvn-project-template

I'm not sure where you got your versions from or what tutorials you found but the project would also appreciate feedback if you can't find something. Please feel free to file an issue at: https://github.com/eclipse/deeplearning4j/issues if you can't find something. Thanks.

I only need to use a cpu and I need it to run on both windows and linux so is there a good way to switch between the two depending on the system my jar is running on? Also do I really need all of those jar files in my libs folder? — , Jun 02 '22 at 00:51
Just create jars for just the libraries you need. The above still matters regardless of what your use case is here. There's reasons for the way this is setup the way it is (platform specific binaries in c/c++) That or have users download the jar relevant to their system. It's very standard for almost any application. Depending on how your application is built you could have it dynamically download the relevant jar to your relevant lib directory on startup. — Adam Gibson, Jun 02 '22 at 02:41
Yes but which jars do I actually need to create the model I described above? Look at those screen shots. — , Jun 02 '22 at 03:36
Look at the template I posted. it's a whole github project for what you need. Please refer to those examples. Those are the only official ones and you can reach us more easily there. — Adam Gibson, Jun 02 '22 at 05:12
Alright I am now using the correct shading but the file is still 900MB is there a way I can stop it from shading all the android and ios jars from inside of my pom.xml? I added my updated pom to my original question also the comment says that I should use "nd4j-native-platform" if I only want to train with the cpu but when I add -platform to my artifactID I get an error any thoughts? — , Jun 02 '22 at 08:14
Yes, as mentioned above, we can select by platform: https://github.com/bytedeco/javacpp-presets/wiki/Reducing-the-Number-of-Dependencies — Samuel Audet, Jun 03 '22 at 02:58

Creating a DL4J java application

1 Answers1