ByteBuffer - compareTo method might diverge

Question

Based on the article here, compareTo method on the ByteBuffers might not work correctly when dealing with negative numbers

bytes in Java are signed, contrary to what one typically expects. What is easy to miss
though, is the fact that this affects ByteBuffer.compareTo() as well. The Java API
documentation for that method reads:

"Two byte buffers are compared by comparing their sequences of remaining elements 
lexicographically, without regard to the starting position of each sequence within its
corresponding buffer."

A quick reading might lead one to believe the result is what you would typically expect, 
but of course given the definition of a byte in Java, this is not the case. The result 
is that the order of byte buffers that contains values with the highest order bit set,   
will diverge from what you may be expecting.

I tried a couple of examples of putting negative values into the buffer, and comparing with positives, it was always OK. Is the article speaking of the case when we e.g. read in binary data, when an integer -1 was stored as 100000...001 and that would cause issues?

I'd say what you might expect would be that a byte buffer containing a negative byte would be "smaller" (in the context of compareTo) than a byte buffer containing a positive byte but due to the signedness it would be the other way round, i.e. -1 > 1 because `11111111111111111111111111111111` > `00000000000000000000000000000001`. — Thomas, Mar 07 '14 at 16:05
There is no mystery. The Javadoc of ByteBuffer.compareTo() states that bytes are compared as if via Byte.compareTo(), which in turn specifies a signed comparison. — user207421, Mar 07 '14 at 21:51

score 1 · Answer 1 · answered Mar 07 '14 at 18:37

The article is claiming that a ByteBuffer containing bytes with their high bits set (i.e., bytes with values in the range 0x80 to 0xFF) will be treated as negative values when compared to the corresponding bytes in another buffer. In other words, each byte is treated as an 8-bit signed integer. Thus you should expect that a byte value of 0x90 will compare less than a byte value of 0x30.

At least, that's the theory, given the standard behavior of byte value in Java. In practice, I would expect that bytes would be compared as 8-bit unsigned integers, so that a byte of 0x90 would compare greater than a byte of 0x30.

It all depends on how the term "lexicographical order" is to be interpreted. If each byte represents an 8-bit character code, for example, then it should logically be treated as an unsigned value (regardless of how Java normally treats byte objects).

So it boils down to how the comparison of two ByteBuffers is actually implemented. One way is to treat the bytes as Java signed integers, as:

// Note: Code has be simplified for brevity
int compare(byte[] buf1, byte[] buf2)
{
    ...
    for (int i = 0;  i < buf1.length  &&  i < buf2.length;  i++)
    {
        int cmp = buf1[i] - buf2[i];        // Signed arithmetic
        if (cmp != 0)
            return cmp;
    }
    return (buf1.length - buf2.length);
}

The other way is to treat the bytes as unsigned integers, which uses slightly different arithmetic for the comparison:

        int cmp = (buf1[i] & 0xFF) - (buf2[i] & 0xFF);  // Unsigned arithmetic

As I said above, I would expect the second approach to be used by most implementations, without any specific definition of "lexicographical order" given.

I've downvoted as it does not handle the default case for Java SE, which could be easily tested. It's annoying that the Java API is not sufficient though. I'll file a bug report if it isn't there yet. — Maarten Bodewes, Jul 12 '14 at 16:29
@Maarten Bodewes, but the answer correctly explains the point what was meant in the quoted article - how lexicographic order is affected depending on weather bytes are treated as signed or unsigned. Upvoted. — uvsmtid, Mar 14 '21 at 11:27
@uvsmtid Instead of arguing with you I have [created another answer](https://stackoverflow.com/a/66624310/589259) based on an earlier comment. I hope you see what I'm trying to say - I didn't have time when this was posted to write an extensive answer or look more deeply, but I knew it to be wrong. — Maarten Bodewes, Mar 14 '21 at 12:05
Thanks, This was the key to my problem, I was doing a compareTo on EBCDIC numbers which are represented as F0, F1, F2..... and java's default widening of byte to int when doing Byte.compare(x, y) using "return x-y" failed in my case coz, it should have been 0x000000F1 and not 0xFFFFFFF1, and for my solution I had to use Byte.compareUnsigned(x, y) which resolved my issue. — mahee96, Nov 13 '21 at 10:53

Maarten Bodewes · Answer 2 · 2021-03-14T12:23:50.747

As mentioned in the comments below the question, the description of public int compareTo(ByteBuffer that):

... Two byte buffers are compared by comparing their sequences of remaining elements lexicographically, without regard to the starting position of each sequence within its corresponding buffer. Pairs of byte elements are compared as if by invoking Byte.compare(byte,byte). ...

which leads to public int compareTo(Byte anotherByte)

... returns the value 0 if this Byte is equal to the argument Byte; a value less than 0 if this Byte is numerically less than the argument Byte; and a value greater than 0 if this Byte is numerically greater than the argument Byte (signed comparison [emphasis mine]). ...

So in the JavaDoc description the distinction is made pretty clear. Actually, most senior Java developers would expect problems here and would have understood what was happening in the compareTo. So in that sense there is no confusion.

This invalidates this other answer because there is no room for implementation differences within Java.

The problem is of course that in most other programming languages the byte is treated as unsigned. Furthermore, a lexicographical comparison of byte arrays would normally assume that the bytes are unsigned (larger structures / numbers are unlikely to be created out of unsigned bytes that may have negative values).

So an unsuspecting programmer that is used to other languages or a programmer who just mindlessly assumes unsigned bytes (and directly translates "compare bytes" in a description to ByteBuffer.compareTo(ByteBuffer that)) is in for a surprise.

This second part is what the article alludes to. Treating bytes was arguably a mistake in Java, as you'd generally use the constructs not for memory savings but for IO, and there bytes are commonly treated as unsigned.

ByteBuffer - compareTo method might diverge

2 Answers2