1

I have too components that deal with n-dimension array. One component is written in python which process the data and save the processed ndarray by tobytes(). Now the other component is written in java, which need to read the serialized ndarray produced in first component.

I am curious if there are any existing java libraries that can read serialized numpy array. Or there is a better way to communicate ndarray between java & python.

Any advice is appreciated!

Thank you!

desertnaut
  • 57,590
  • 26
  • 140
  • 166
Alfred
  • 1,709
  • 8
  • 23
  • 38

2 Answers2

0

ND4J supports reading from and writing to Numpy arrays. Look at the ND4J javadocs for xxxNpyYYYArray methods .

It can read and write from/to files, byte arrays and even raw pointers to a numpy array.

The pointer methods allow for using the arrays without copying or serialization. We use the pointer methods inside jumpy (which runs Java via pyjnius) and when using javacpp's cpython/numpy preset to run a cpython interpreter inside a Java process.

wm_eddie
  • 3,938
  • 22
  • 22
  • @vm_eddie thank you for pointing that out. So on python side, I save the numpy array with np.tobytes() and encoded with base64. Then on java side, I decode with base64 and load it with `Nd4j.createNpyFromByteArray()`. However, I am getting `Assertion failed: (littleEndian), function parseNpyHeaderStr, file /Volumes/jenkins_ws/jenkins/workspace/deeplearning4j-deeplearning4j-1.0.0-beta4-macosx-x86_64-cpu/libnd4j/include/cnpy/cnpy.cpp, line 216.` Do you have any idea what that means. – Alfred Oct 23 '19 at 18:22
0

I have used Apache Arrow to solve this.

First the pyarrow package has a numpy ndarray API to serialize the array into bytes. Basically the ndarray becomes an Arrow bytes sequence batch.

Then the java API provides a VectorSchemaRoot to read it from the bytes. And you could get the values in the Arrow array. You could use this array to create ND4J array(if you need), or directly operate your array.

For detailed operations you could refer to Apache Arrow doc, and if any obstacles we could discuss here.

Also, Arrow uses native memory to store the buffer so the data is off the java heap. This may be an issue at some point.

Any other solutions could also share with me. :)

Litchy
  • 623
  • 7
  • 23