1

I've been reading the Apache Arrow docs, and I've figured out how to use it in Java and C++. But what I'd like to do is offload some work to JNI (C/C++) code from Java, and the documentation (e.g. https://arrow.apache.org/docs/java/cdata.html) just doesn't seem to cover my use cases, and methods in the example (e.g. getMemoryAddress on IntVector) just don't seem to exist like they do in the examples. I want to start simple, so here's what I'd like to do:

  • Allocate two Arrow IntVector's in Java and fill them with data
  • Allocate space for another IntVector in Java for the result
  • Get whatever native pointers I need from those vectors and pass them through a JNI call
  • Wrap those vectors in C++ so I can access them.
  • Do whatever work I want to offload and finalize the result vector
  • Return to Java and have the result accessible.

Can anyone point me to an example or some tips on how to do this?

BTW, the examples also use JavaCPP instead of JNI. But I already have a bunch of JNI code in this project, and I'd rather not mix in another kind of bridge if it's not necessary.

Thanks.

I tried allocating IntVector objects in Java, but I can't tell which naive pointers I have to retrieve to pass to C++ to provide proper access to those vectors.

  • BTW, for that kind of use case, using the C++ API of Arrow from Java, using something like the JavaCPP Presets for Arrow, is bound to be a lot more efficient than using the Java implementation of Arrow: https://github.com/bytedeco/javacpp-presets/tree/master/arrow – Samuel Audet Jan 29 '23 at 08:10

1 Answers1

3

JavaCPP is merely a convenience for the example, JNI is fine.

The C Data Interface is still what you want. When you say "get whatever native pointers I need": that is exactly what a struct ArrowArray is in the C Data Interface. Use the C Data Interface module in Java to export your Java arrays and get the address of a struct ArrowArray, and then pass that address to your C++ code via JNI. Then, use libarrow's C Data Interface implementation to import the arrays and work with them.

When the C++ side is done, it does the same thing: it exports the result vector and returns an address to Java via JNI; the Java code then imports the vector from that address.

li.davidm
  • 11,736
  • 4
  • 29
  • 31
  • Thanks for the response. Looking at the example, it looks like there's a lot of back-and-forth between JNI and Java code. For example native memory has to be allocated for the ArrowArray and passed to Java so it can populate the structure. Is there no way to avoid all that overhead? Would it better to create the vectors in C++ and import them into Java and fill them there? – Timothy Miller Jan 23 '23 at 19:18
  • The example is just one way of using it. You can use ArrowArray.allocateNew as well (which will be backed by a DirectByteBuffer). – li.davidm Jan 23 '23 at 23:16