0

I am working on using Java for reading of (potentially) large amounts of data from (potentially) large files - the scenario is uncompressed imagery from a file format like HEIF. Larger than 2G is likely. Writing is a future need, but this question is scoped to reading.

The HEIF format (which is derived from ISO Base Media File Format - ISO/IEC 14496-12) is variable sizes "boxes" - you read the length and kind of box, and do some parsing thing appropriate to the box. In my design, I'll parse out the small-ish boxes, and keep references to the bulk storage (mdat) offsets to be able to pull the data out for rendering / processing as requested.

I'm considering two options - multiple MappedByteBuffer (since that is 2G limited), and a single MemorySegment (from a memory mapped file). Its not clear to me which is likely to be more efficient. The MappedByteBuffer has all the nice ByteBuffer API, but I need to manage multiple entities. The MemorySegment will be a single entry, but it looks I'll need to create slice views to get anything I can read from (e.g. a byte array or ByteBuffer), which looks like a different version of the same problem. A secondary benefit for the MemorySegment is that it may lead to a nicer design when I need to use some other non-Java API (like feeding the imagery into a hardware encoder for compression). I also have the skeleton of the MemorySegment implemented and reading (just with some gross assumptions that I can turn it into a single ByteBuffer).

Are there emerging patterns for efficient reading from a MemorySegment? Failing that, is there something I'm missing in the MemorySegment API?

BradHards
  • 650
  • 9
  • 27
  • 2
    For JDK 17, you could take a look at the MemoryAccess class: https://docs.oracle.com/en/java/javase/17/docs/api/jdk.incubator.foreign/jdk/incubator/foreign/MemoryAccess.html Alternatively, you could create a MemoryLayout, and use the [varhandle](https://docs.oracle.com/en/java/javase/17/docs/api/jdk.incubator.foreign/jdk/incubator/foreign/MemoryLayout.html#varHandle(java.lang.Class,jdk.incubator.foreign.MemoryLayout.PathElement...)) method to get a VarHandle that can be used to access certain fields (but for variable sized data you probably want to stick with the accessors in MemoryAccess). – Jorn Vernee Nov 12 '21 at 20:26
  • Thanks Jorn. If you paste this into an answer I'll accept it. – BradHards Nov 12 '21 at 20:49

1 Answers1

1

This question is already 1,5 years old, and concerns an API which is still evolving, and has not yet been finished. You probably made a decision by now, but it's never too late for an answer to your question.

My suggestion is the new FFM API being introduced into Java. It is this API that includes MemorySegment. This API is a replacement for JNI and the ByteBuffer APIs. It gives you much more control over memory management, is more deterministic, provides a much larger address space, and gives you more control over reading and writing primitives, the offsets at which you do this, and lets you create structured accessors for related primitive data in a segment.

Be aware, however, that this API has just recently moved from the incubator phase into the preview phase. This means that the API is nearing completion, but is still not stable. It will change in upcoming Java versions, and so you will have to update your codebase for every new JDK. Your project will not be backwards- and forward-compatible until the API exits the preview state.

it looks I'll need to create slice views to get anything I can read from (e.g. a byte array or ByteBuffer)

This is not the case. There are some examples in the JEPs

MemorySegment provides direct access methods such as set (offset), setAtIndex, get (offset) and getAtIndex for primitives, subclasses of the sealed interface Addressable, and MemoryAddress.

You can also use MemoryLayout and VarHandle for structured access.

You have more options with the new MemorySegment API, than you're given with the ByteBuffer API.