Reading bytes in Java

Question

I am trying to understand how the following line of code works:

for (int i = 0; i < numSamples; i++) {
        short ampValue = 0;
        for (int byteNo = 0; byteNo < 2; byteNo++) {
            ampValue |= (short) ((data[pointer++] & 0xFF) << (byteNo * 8));
        }
        amplitudes[i] = ampValue;
    }

As far as I understand, this is reading 2 bytes (as 2 bytes per sample) in a inclusive manner, i.e. the ampValue is composed of two byte reads. The data is the actual data sample (file) and the pointer is increasing to read it upto the last sample. But I don't understand this part:

"data[pointer++] & 0xFF) << (byteNo * 8)); "

Also, I am wondering whether it makes any difference if I want to read this as a double instead of short?

I know it is 16 bit little endian format... but I don't know much about endianness or think that is going to be a problem @huseyinTugrulBuyukisik — Nate Raswar, Aug 24 '14 at 18:45
@huseyintugrulbuyukisik, the endianness is hard-coded in this example: It always interprets the bytes in little-endian order regardless of the platform architecture. That is exactly what you would want if you knew that you were reading from a known, little-endian file format. — Solomon Slow, Aug 24 '14 at 19:27

score 0 · Answer 1 · answered Aug 24 '14 at 18:48

In Java, all bytes are signed. The expression (data[pointer++] & 0xFF) converts the signed byte value to an int with the value of the byte if it were unsigned. Then the expression << (byteNo * 8) left-shifts the resulting value by zero or eight bits depending on the value of byteNo. The value of the whole expression is assigned with bitwise or to ampValue.

There appears to be a bug in this code. The value of ampValue is not reset to zero between iterations. And amplitude is not used. Are those identifiers supposed to be the same?

yes, amplitude is ampValue, will fix straight away! – Nate Raswar Aug 24 '14 at 18:53 — Nate Raswar, Aug 24 '14 at 18:53

score 0 · Accepted Answer · answered Aug 24 '14 at 18:48

Looks like data[] is the array of bytes.

data[pointer++] gives you a byte value in the range [-128..127].

0xFF is an int contstant, so...

data[pointer++] & 0xFF promotes the byte value to an int value in the range [-128..127]. Then the & operator zeroes out all of the bits that are not set in 0xFF (i.e., it zeroes out the 24 upper bits, leaving only the low 8 bits.

The value of that expression now will be in the range [0..255].

The << operator shifts the result to the left by (byteNo * 8) bits. That's the same as saying, it multiplies the value by 2 raised to the power of (byteNo * 8). When byteNo==0, it will multiply by 2 to the power 0 (i.e., it will multiply by 1). When byteNo==1, it will multiply by 2 to the power 8 (i.e., it will multiply by 256).

This loop is creating an int in the range [0..65535] (16 bits) from each pair of bytes in the array, taking the first member of each pair as the low-order byte and the second member as the high-order byte.

It won't work to declare ampValue as double, because the |= operator will not work on a double, but you can declare the amplitudes[] array to be an array of double, and the assignment amplitudes[i] = ampValue will implicitly promote the value to a double value in the range [0.0..65535.0].

Additional info: Don't overlook @KevinKrumwiede's comment about a bug in the example.

Is there some other way I can read directly from the bytes as double without using the |= operator. I found another example here: http://stackoverflow.com/questions/13024683/obtain-wave-pattern-of-a-audio-file-in-java/13024779#13024779 — Nate Raswar, Aug 24 '14 at 18:59
@NateRaswar, assuming that ampValue is properly zeroed, then in this particular case, the `|=` operator will give the same answer as `+=`. The | operator, which operates only on int-like values, gives you _bitwise or_. That is, bit n will be set in the result iff bit n in either operand is set. The way the math works in this specific case, there will be no overlap between the two values that are ored together, so the bitwise-or effectively is the same as adding the two operands. — Solomon Slow, Aug 24 '14 at 19:23
Thanks a lot! It did work, I am now reading directly from the data as double with the `+=` operator. I had to add `/32676.0` at the end. These two modifications and I am reading the double values exactly as I did in Matlab. — Nate Raswar, Aug 24 '14 at 20:17

Andy Davies · Answer 3 · 2014-08-24T21:01:18.653

Let's break down the statement:

|= is the bitwise or and assignment operator. a |= b is equivalent to a = a | b.
(short) casts the int element from the data array to a short.
pointer++ is a post-increment operation. The value of pointer will be returned and used and then immediately incremented every single time it's accessed in this fashion - this is beneficial in this case because the outer-loop is cycling through 2-byte samples (via the inner loop) from the contiguous data buffer, so this keeps incrementing.
& is the bitwise AND operator and 0xFF is the hexadecimal value for the byte 0b11111111 (255 in decimal); the expression data[pointer++] & 0xFF is basically saying, for each bit in the byte retrieved from the data array, AND it with 1. In this context, it forces Java, which by default stores signed byte objects (i.e. values from -128 to 127 in decimal), to return the value as an unsigned byte (i.e. values from 0 to 255 decimal).
Since your samples are 2 bytes long, you need to shift the second lot of 8 bits left, as the most significant bits, using the left bit-shift operator <<. The byteNo * 8 ensures that you're only shifting bits when it's the second of the two bytes.

After the two bytes have been read, ampValue will now contain the value of the sample as a short.

just fixed the bug, ampValue was meant to be amplitude. It makes sense now. However, do you know how I can read this as a double? It says in eclipse that the operator |= is undefined for the type double! — Nate Raswar, Aug 24 '14 at 19:05
"(short) casts the (presumably) Byte object from the data array to a short" No, The value that is being cast to (short) is an (int) value. The bytes from the array are promoted to (int) when they are bitwise-anded with the (int) constant, 0xFF. — Solomon Slow, Aug 24 '14 at 19:30
@NateRaswar, ah right, ok! I'll edit my answer now that that's cleared up. A `short` is a type of integer whereas `double` is a floating point number so bitwise operations don't make a huge amount of sense, presumably why it's not allowed in Java. You could always cast `ampValue` to a `double` variable in the next line if you need it as a double. @jameslarge, oh yeah, my bad, of course haha. I'll update that, cheers for pointing it out. — Andy Davies, Aug 24 '14 at 20:59

Reading bytes in Java

3 Answers3