I am working on using Java for reading of (potentially) large amounts of data from (potentially) large files - the scenario is uncompressed imagery from a file format like HEIF. Larger than 2G is likely. Writing is a future need, but this question is scoped to reading.
The HEIF format (which is derived from ISO Base Media File Format - ISO/IEC 14496-12) is variable sizes "boxes" - you read the length and kind of box, and do some parsing thing appropriate to the box. In my design, I'll parse out the small-ish boxes, and keep references to the bulk storage (mdat
) offsets to be able to pull the data out for rendering / processing as requested.
I'm considering two options - multiple MappedByteBuffer (since that is 2G limited), and a single MemorySegment (from a memory mapped file). Its not clear to me which is likely to be more efficient. The MappedByteBuffer has all the nice ByteBuffer API, but I need to manage multiple entities. The MemorySegment will be a single entry, but it looks I'll need to create slice views to get anything I can read from (e.g. a byte array or ByteBuffer), which looks like a different version of the same problem. A secondary benefit for the MemorySegment is that it may lead to a nicer design when I need to use some other non-Java API (like feeding the imagery into a hardware encoder for compression). I also have the skeleton of the MemorySegment implemented and reading (just with some gross assumptions that I can turn it into a single ByteBuffer).
Are there emerging patterns for efficient reading from a MemorySegment? Failing that, is there something I'm missing in the MemorySegment API?