Are the ByteBuffer/IntBuffer/ShortBuffer Java classes fast?

Question

I'm working on an Android application (in Java, obviously) and I recently updated my UDP reader code. In both versions, I set up some buffers and receive a UDP packet:

byte[] buf = new byte[10000];
short[] soundData = new short[1000];
DatagramPacket packet = new DatagramPacket (buf, buf.length);
socket.receive (packet);

In the initial version, I put the data back together one byte at a time (it's actually 16 PCM audio data):

for (int i = 0; i < count; i++)
    soundData[i] = (short) (((buf[k++]&0xff) << 8) + (buf[k++]&0xff));

In the updated version, I used some cool Java tools I didn't know about when I started:

bBuffer  = ByteBuffer.wrap (buf);
sBuffer  = bBuffer.asShortBuffer();
sBuffer.get (soundData, 0, count);

In both cases, "count" is being populated correctly (I checked). However, there appear to be new problems with my streaming audio -- perhaps it isn't being handled fast enough -- which doesn't make any sense to me. Obviously, the buffer code is compiling into a lot more than three statements of JVM code, but it sure seemed like a reasonable assumption when I start this that the 2nd version would be faster than the 1st.

Patently, I'm not insisting that my code HAS to use Java NIO buffers, but at first glance at least, it DOES seem like a mo' betta' to go about this.

Anybody got any recommendations for a fast, simple Java UDP reader and whether there is a generally accepted "best way"??

Thanks, R.

NIO isn't intended to be faster than "normal" IO, its just more scalable. — skaffman, Oct 27 '10 at 21:30
`stole the syntax` Lol are you serious? Google stole java syntax as much as an author steels the English syntax — Falmarri, Oct 27 '10 at 22:39
@Tim Bender what does that have to do with the question? There might be differences between the Dalvik vm vs. standard jvm, but in general if things are slow in the jvm they are likely to also be slow on Android. Do you have some knowledge of a difference that might affect this particular case? — Cheryl Simon, Oct 27 '10 at 22:40
@Mayra, the library itself is completely different. Android library implementations != Sun library implementations. Also, Java-Like code compiled for Android is not compiled down to Java bytecode but instead compiled down to Dalvik. The point? This question was originally tagged "Java" when it in fact should be tagged "android". Since Rich is an android developer and not a Java developer he should at least know how to properly tag his questions :) — Tim Bender, Oct 27 '10 at 23:53
@Falmarri, your analogy doesn't hold. Authors don't completely redefine the meaning of all words in the English language and then insist that others write novels using the proper definitions and expect the alternate translation to remain as elegant. — Tim Bender, Oct 27 '10 at 23:55
Authors sure as hell redefine the meanings of words. And google isn't insisting anything. If you want to write for android, use their api. Or not, you can write in C++. Or compile a python interpreter. It's open source. As for your response to @Mayra, [android] tagged questions are very legitimately also tagged [java]. Are people using the openJDK implementations not allowed to tag their questions [java] either? You clearly must work for Sun/Oracle to be trolling this hard. — Falmarri, Oct 28 '10 at 00:50

score 4 · Answer 1 · answered May 11 '12 at 08:28

Your code would be more efficient if instead of reading a packet into a byte array (copying the data from a native buffer into the array) and then wrapping it in a new ByteBuffer (creating a new object) and converting to a ShortBuffer (creating a new object) you set up your objects only once and avoided the copy.

You can do this by using DatagramChannel.socket() to create your socket, then connecting it as usual and usuing socket.getChannel() to get a DatagramChannel object. This object will allow you to read packets directly into an existing ByteBuffer (which you should create with ByteBuffer.allocateDirect for maximum efficiency). You can then us asShortBuffer() just once to create a view of your data as shorts, and read from that ShortBuffer after every time you refill the ByteBuffer.

The code therefore looks like this:

 DatagramSocket socket = DatagramChannel.socket();
 // code to connect socket
 DatagramChannel channel = socket.getChannel();
 ByteBuffer buffer = ByteBuffer.allocateDirect (10000);
 // you may want to invoke buffer.order(...) here to tell it what byte order to use
 ShortBuffer shortBuf = buffer.asShortBuffer();

 // in your receive loop:
 buffer.clear();
 channel.receive(buffer);
 shortBuf.position(0).limit(buffer.position()/2); // may ignore a byte if odd number received
 shortBuf.get(soundBuf,0,shortBuf.limit());

You should find this is much more efficient than your previous code because it avoids an entire copy of the data and the format conversion is handled by hand-optimized code rather than compiler generated byte manipulation which may be suboptimal. It will be somewhat more efficient if you use the platform-native byte order (I believe Android uses little-endian byte order on all platforms it is available for, and your code above seems to be big-endian, so this may not be possible for you), in which case shortBuf.get() becomes a direct memory copy.

Great answers Jules, not sure why this isn't the accepted one. — Riyad Kalla, Aug 15 '13 at 17:55

score 2 · Accepted Answer · answered Oct 28 '10 at 00:43

2

In general, working with primitive types directly is going to be more efficient than working with objects because you avoid some of the overhead of creating objects, function calls, etc.

There are reasons to use the utility objects other than speed: convenience, safety, etc.

The best way to test the difference in this particular case would be to actually measure it. Try out both methods with a large dataset and time it. Then, you can decide if it is worth the benefits in this case.

You can also use Android's profiler to see where your problems really are. See TraceView.

answered Oct 28 '10 at 00:43

Cheryl Simon

46,552
15
93
82

I seem to have touched off a religious war... Sorry about that, and thanks for the rationality. – Rich Oct 28 '10 at 12:29
Some of the NIO stuff is pretty horrid at the moment. Improvements have been made and will start showing up in future releases. In the mean time, "test & measure" is a good answer. – fadden Oct 29 '10 at 23:52

score 0 · Answer 3 · answered Oct 29 '10 at 02:29

0

I would use DataInputStream for this task, wrapped around a ByteArrayInputStream wrapped around the byte array. Encapsulates the shifting in readShort() without the overheads of ByteBuffer.

Alternatively you could read directly into a DirectByteBuffer via a DatagramChannel, rather than using DatagramPacket & DatagramSocket. At present you're using a mixture of java.net and java.nio.

I wouldn't expect major performance differences between these two approaches, but I would expect them both to be faster than your hybrid approach.

answered Oct 29 '10 at 02:29

user207421

305,947
44
307
483

*Encapsulates the shifting in readShort() without the overheads of ByteBuffer.* readShort has more overhead than the same operation of a ByteBuffer – bestsss Mar 20 '11 at 19:18
@bestsss: 'overheads' plural: especially creation overheads. It is those I am referring to. – user207421 Mar 22 '11 at 11:51
direct ByteBuffer has some allocation (and deallocation, due to finalizer based one) but ByteArrayInputStream has the same allocation cost as ByteBuffer.wrap and as a bonus all the methods are synchronized, plus ByteBuffer tends to enjoy extra intrinsic by the JIT. Also you do not create a direct ByteBuffer for each packet but once per socket (or take it off a pool). – bestsss Mar 22 '11 at 11:58
@bestsss: you're getting pretty far off the point. If ByteBuffer.readShort() is more efficient than DataInputStream.readShort(), given the other overheads of the latter, whatever they may be comparable to, I would like to see some actual evidence. – user207421 Mar 28 '11 at 09:56
@EJP, write your own test, or just look at the impl. – bestsss Mar 28 '11 at 11:27
I did look at the impl, and I saw that it called the Bits class, where there is Java code that looks exactly the same as DataInputStream.readShort(). – user207421 Mar 29 '11 at 07:20
@EJP, you get 2 indirections via a *heavily* virtual method read() of the underlying inputstream, with a bytebuffer you get a direct access to the byte[] array, the read of a byte[] is not performed byte by byte but a 32bits, so you get a single access and the ByteBuffer is designed for intrinsic by the hotspot (like removing the bound checks). If you wish to go further you can either test or just use GDB and see the assembler. Since the question is about android, indeed, I cannot be perfectly sure how they optimize/impl. the ByteBuffer there. – bestsss Mar 29 '11 at 07:45
@bestsss what exactly is a 'heavily' virtual method? as opposed to a method? and how much difference does all this make in practice as opposed to the actual I/O? – user207421 Mar 29 '11 at 09:25
@EJP, the concern has nothing to do w/ the IO (disk/socket access), so it assumes both are in memory/buffered already. The heavily virtual means that the call site (`read()`) in the current case has more than 2 targets. A single target (class) is a static call and easily inlined, 2 targets are usually inlined w/ a class check. More than that requires a call through virtual table (which is the slowest part of readShort) or alternatively for hotspot: inline caches that are still slower. ByteBuffers are always the single target and inlined properly, that has been the main idea to introduce 'em. – bestsss Mar 29 '11 at 09:51
@EJP, i take liberty and include 2 articles on the matter (of the heavily virtual call) some practical test: http://www.javaspecialists.eu/archive/Issue158.html ... and the explanation of Cliff Click: http://www.azulsystems.com/blog/cliff/2010-04-08-inline-caches-and-call-site-optimization – bestsss Mar 29 '11 at 09:57

Are the ByteBuffer/IntBuffer/ShortBuffer Java classes fast?

3 Answers3