1

In the ByteToMessage decoder (https://github.com/netty/netty/blob/master/codec/src/main/java/io/netty/handler/codec/ByteToMessageDecoder.java), which ReplayingDecoder derives from, the cumulation ByteBuf (used to accumulate data until enough has been read from the network to begin decoding) seems to be implemented like a dynamic array.

By this I mean that, if the current cumulation ByteBuf has the capacity to retain the incoming data, it is copied to the cumulation ByteBuf. If there is insufficient capacity, the cumulation ByteBuf is expanded and both the previous cumulation ByteBuf and the incoming data are written to the newly allocated instance. Is there a reason that a CompositeByteBuf with a bounded number of components is not used here instead?

Using a PooledByteBufAllocator should help reduce the number of memory allocations, but it still seems to me that using a CompositeByteBuf in conjunction with a PooledByteBufAllocator would be the most efficient solution as it would attempt to both optimize memory allocations and copies.

However, before I go down the rabbit hole of implementing my own pipeline stage for zero copy aggregation, I wanted to ask if there is a particular reason for the current implementation (e.g. Is the construction of a CompositeByteBuf being performed under the hood by one of the copy calls, or has someone already found that the current strategy is empirically better?)

Thanks in advance

  • How do you intend to use these buffers after the ByteToMessageDecoder is done? These are just assumptions but perhaps the current implementation makes the following assumptions/trade-offs: 1) A few codecs have headers which define the size of the buffer and this may allow them to only resize 1 (or a relatively few) amount of times. 2) It is assumed most applications will use the buffer sequentially and having it be contiguous is beneficial for this usecase. 3) Copying the data into a contiguous buffer will allow easier use for applications. Anyways it would be interesting to see benchmarks. – Scott Mitchell Nov 11 '14 at 01:05
  • So once the aggregation occurs the contents will be written directly to a GatheringByteChannel. The buffers themselves will be pooled and direct. My assumption is that it will be more efficient to let the OS perform a gathering write from the native buffers to the channel instead of performing (potentially multiple) allocations and copies to form a contiguous region for transfer. My experience using gathering writes for Unix file descriptors, in C, on Linux leads me to believe this is the most performant way to accomplish this, but I've never attempted this with the Java NIO facilities. – Kenneth Owens Dec 05 '14 at 21:57
  • Sounds reasonable. Care to submit a pull request to https://github.com/netty/netty? We love contributions :) – Scott Mitchell Dec 09 '14 at 23:38

0 Answers0