In the ByteToMessage decoder (https://github.com/netty/netty/blob/master/codec/src/main/java/io/netty/handler/codec/ByteToMessageDecoder.java), which ReplayingDecoder derives from, the cumulation ByteBuf (used to accumulate data until enough has been read from the network to begin decoding) seems to be implemented like a dynamic array.
By this I mean that, if the current cumulation ByteBuf has the capacity to retain the incoming data, it is copied to the cumulation ByteBuf. If there is insufficient capacity, the cumulation ByteBuf is expanded and both the previous cumulation ByteBuf and the incoming data are written to the newly allocated instance. Is there a reason that a CompositeByteBuf with a bounded number of components is not used here instead?
Using a PooledByteBufAllocator should help reduce the number of memory allocations, but it still seems to me that using a CompositeByteBuf in conjunction with a PooledByteBufAllocator would be the most efficient solution as it would attempt to both optimize memory allocations and copies.
However, before I go down the rabbit hole of implementing my own pipeline stage for zero copy aggregation, I wanted to ask if there is a particular reason for the current implementation (e.g. Is the construction of a CompositeByteBuf being performed under the hood by one of the copy calls, or has someone already found that the current strategy is empirically better?)
Thanks in advance