2

Given an object byte[], when we want to operate with such object often we need pieces of it. In my particular example i get byte[] from wire where first 4 bytes describe lenght of the message then another 4 bytes the type of the message (an integer that maps to concrete protobuf class) then remaining byte[] is actual content of the message... like this

length|type|content

in order to parse this message i have to pass content part to specific class which knows how to parse an instance from it... the problem is that often there are no methods provided so that you could specify from where to where parser shall read the array...

So what we end up doing is copying remaining chuks of that array, which is not effective...

As far as i know in java it is not possible to create another byte[] reference that actually refers to some original bigger byte[] array with just 2 indexes (this was approach with String that led to memory leaks)...

I wonder how do we solve situations like this? I suppose giving up on protobuf just because it does not provide some parseFrom(byte[], int, int) does not make sence... protobuf is just an example, anything could lack that api...

So does this force us to write inefficient code or there is something that can be done? (appart from adding that method)...

vach
  • 10,571
  • 12
  • 68
  • 106
  • 2
    Use a `ByteBuffer`; it's the best class for this in the JDK. Manipulating it is a little tricky though (beware of positions!). – fge Sep 30 '15 at 09:04
  • I know but even with bytebuffer you dont have parse(ByteBuffer, int, int), when you get the remaining from bytebuffer you'll end up createing another byte[] which is what i want to avoid... – vach Sep 30 '15 at 09:05
  • 1
    Sorry, but I don't get your point at all. If you really want an array which is only part of the original, you don't really have a choice. – fge Sep 30 '15 at 09:06

3 Answers3

2

Normally you would tackle this kind of thing with streams.

A stream is an abstraction for reading just what you need to process the current block of data. So you can read the correct number of bytes into a byte array and pass it to your parse function.

You ask 'So does this force us to write inefficient code or there is something that can be done?'

Usually you get your data in the form of a stream and then using the technique demonstrated below will be more performant because you skip making one copy. (Two copies instead of three; once by the OS and once by you. You skip making a copy of the total byte array before you start parsing.) If you actually start out with a byte[] but it is constructed by yourself then you may want to change to constructing an object such as { int length, int type, byte[] contentBytes } instead and pass contentBytes to your parse function.

If you really, really have to start out with byte[] then the below technique is just a more convenient way to parse it, it would not be more performant.

So suppose you got a buffer of bytes from somewhere and you want to read the contents of that buffer. First you convert it to a stream:

private static List<Content> read(byte[] buffer) {
    try {
        ByteArrayInputStream bytesStream = new ByteArrayInputStream(buffer);
        return read(bytesStream);
    } catch (IOException e) {
        e.printStackTrace();
    }
}

The above function wraps the byte array with a stream and passes it to the function that does the actual reading. If you can start out from a stream then obviously you can skip the above step and just pass that stream into the below function directly:

private static List<Content> read(InputStream bytesStream) throws IOException {
    List<Content> results = new ArrayList<Content>();
    try {
        // read the content...
        Content content1 = readContent(bytesStream);
        results.add(content1);

        // I don't know if there's more than one content block but assuming
        // that there is, you can just continue reading the stream...
        //
        // If it's a fixed number of content blocks then just read them one
        // after the other... Otherwise make this a loop
        Content content2 = readContent(bytesStream);
        results.add(content2);
    } finally {
        bytesStream.close();
    }
    return results;
}

Since your byte-array contains content you will want to read Content blocks from the stream. Since you have a length and a type field, I am assuming that you have different kinds of content blocks. The next function reads the length and type and passes the processing of the content bytes on to the proper class depending on the read type:

private static Content readContent(InputStream stream) throws IOException {
    final int CONTENT_TYPE_A = 10;
    final int CONTENT_TYPE_B = 11;

    // wrap the InputStream in a DataInputStream because the latter has
    // convenience functions to convert bytes to integers, etc.
    // Note that DataInputStream handles the stream in a BigEndian way,
    // so check that your bytes are in the same byte order. If not you'll
    // have to find another stream reader that can convert to ints from
    // LittleEndian byte order.
    DataInputStream data = new DataInputStream(stream);
    int length = data.readInt();
    int type = data.readInt();

    // I'm assuming that above length field was the number of bytes for the
    // content. So, read length number of bytes into a buffer and pass that 
    // to your `parseFrom(byte[])` function 
    byte[] contentBytes = new byte[length];
    int readCount = data.read(contentBytes, 0, contentBytes.length);
    if (readCount < contentBytes.length)
        throw new IOException("Unexpected end of stream");

    switch (type) {
        case CONTENT_TYPE_A:
            return ContentTypeA.parseFrom(contentBytes);
        case CONTENT_TYPE_B:
            return ContentTypeB.parseFrom(contentBytes);
        default:
            throw new UnsupportedOperationException();
    }
}

I have made up the below Content classes. I don't know what protobuf is but it can apparently convert from a byte array to an actual object with its parseFrom(byte[]) function, so take this as pseudocode:

class Content {
    // common functionality
}

class ContentTypeA extends Content {
    public static ContentTypeA parseFrom(byte[] contentBytes) {
        return null; // do the actual parsing of a type A content 
    }
}

class ContentTypeB extends Content {
    public static ContentTypeB parseFrom(byte[] contentBytes) {
        return null; // do the actual parsing of a type B content
    }
}
Rob Meeuwisse
  • 2,847
  • 1
  • 17
  • 21
1

In Java, Array is not just section of memory - it is an object, that have some additional fields (at least - length). So you cannot link to part of array - you should:

  • Use array-copy functions or
  • Implement and use some algorithm that uses only part of byte array.
Pavel Uvarov
  • 1,090
  • 1
  • 10
  • 16
1

The concern seems that there is no way to create a view over an array (e.g., an array equivalent of List#subList()). A workaround might be making your parsing methods take in the reference to the entire array and two indices (or an index and a length) to specify the sub-array the method should work on.

This would not prevent the methods from reading or modifying sections of the array they should not touch. Perhaps an ByteArrayView class could be made to add a little bit of safety if this is a concern:

public class ByteArrayView {
  private final byte[] array;
  private final int start;
  private final int length;

  public ByteArrayView(byte[] array, int start, int length) { ... }

  public byte[] get(int index) {
    if (index < 0 || index >= length) {
      throw new ArrayOutOfBoundsExceptionOrSomeOtherRelevantException();
    }
    return array[start + index];
  }
}

But if, on the other hand, performance is a concern, then a method call to get() for fetching each byte is probably undesirable.

The code is for illustration; it's not tested or anything.

EDIT

On a second reading of my own answer, I realized that I should point this out: having a ByteArrayView will copy each byte you read from the original array -- just byte by byte rather than as a chunk. It would be inadequate for the OP's concerns.

Jae Heon Lee
  • 1,101
  • 6
  • 10