1

The short summary

I would expect n & 0xffffffff to yield a 32-bit number, not more. But it's yielding a 64-bit number. Why?

The details

In an Android (Java) app I have the following line of code:

hash = ((hash ^ b) * FNV_PRIME) & 0xffffffff;

When logging the value of hash after this step, I get values like 0x811d68ec0c35c4, 0x342d586144387f57, etc. which are obviously more than 32 bits can hold. They're 64-bit numbers.

hash is of type long. I could give more details about b and FNV_PRIME, but that seems irrelevant to the question. No matter what the value of ((hash ^ b) * FNV_PRIME) is, when we bitwise-AND it with 0xFFFFFFFF, a 32-bit number, we should end up with all zeroes except for the least significant 32 bits. Right?

Is there something implicit going on here with int vs. long data types of intermediate results, and possibly with negative numbers being represented using the high bit?

LarsH
  • 27,481
  • 8
  • 94
  • 152

1 Answers1

0

OK I seem to have found a solution. And I'm going to guess at why it worked. If someone can shed more light on this I'd be happy to hear it.

The fix: add an L to the hex literal on the right side of the & to mark it as a long:

hash = ((hash ^ b) * FNV_PRIME) & 0xffffffffL;

I tested this and it worked: the code now produces only 32-bit values.

So what was wrong and why did that fix it?

The left side of the & is a long value, because hash is a long (and so is FNV_PRIME but that shouldn't matter). For & to do its job, it needs a its operands to be of the same type. So it automatically promotes the right side value, 0xffffffff, from int to long. Since Java types are signed, 0xffffffff is interpreted as -1, which as a long would be 0xffffffffffffffff. So the above line ends up doing the equivalent of

hash = ((hash ^ b) * FNV_PRIME) & 0xffffffffffffffff;

and that's how I ended up getting 64-bit values from it.

When I instead made the right operand 0xffffffffL, it was already a long and didn't need to be promoted, so it didn't get interpreted as a negative number because of the high bit. To put it another way, 0xffffffffL is equivalent to 0x00000000ffffffffL, so the high bit was not set.

What's the moral of this story?

Well I could use help with that. Some ideas:

  • Thoroughly understand how Java decides what data types to use to represent numbers at every intermediate stage throughout every computation. Ugh, that sounds hard, especially when most things "work fine" most of the time.

  • Just muddle through until something doesn't work, and then trace it with a debugger in increasing detail until you find the problem. This assumes that if a program fails, it will fail while still in the developer's hands.

  • Good & thorough unit testing. :-) Not sure if I would have designed a test that would have detected this problem, e.g. a test to assert that the return value of my function was no more than 32 bits long.

  • Pay careful attention to compiler warnings in the IDE. I didn't notice it until late in the game, but eventually I saw that Android Studio had a warning that said:

    'hash = ((hash ^ b) * FNV_PRIME) & 0xffffffff' can be replaced by 'hash = ((hash ^ b) * FNV_PRIME)'

If I had read that earlier I would have been very puzzled, but it would have given me a good clue about the problem.

LarsH
  • 27,481
  • 8
  • 94
  • 152
  • 1
    I think your first "moral" is the correct one. If you think keeping track of *this* is hard, try C :) – Federico klez Culloca May 12 '20 at 19:11
  • @FedericoklezCulloca: I learned C long before Java. In some ways this particular task would have been a lot easier using C's unsigned data types. But you're right, the unpredictable type sizes would have made it a headache to get the code to work cross-platform! Of course there are workarounds... – LarsH May 12 '20 at 19:19
  • I always worry about whether or not something like `0xFFFFFFFFL` will get sign-extended. I don't like to worry, so I would write it `((1L<<32)-1)` – Matt Timmermans May 16 '20 at 01:14
  • @Matt, would that help? I think I would still wonder if it could get sign-extended, given the perfect storm of conditions... – LarsH May 16 '20 at 02:27
  • @MattTimmermans - It is not that hard really. Java integer types (apart from `char`) are signed and will be sign extended. All arithmetic and bitwise operations take either `int` or `long` operands an produce the same as a result. Operands are promoted to make it work. – Stephen C May 16 '20 at 03:00
  • 1
    For what it is worth, the semantics are precisely defined by the JLS. (Not that the spec is easy to read. Precise specs rarely are easy to read. It comes with the territory ...) The real problem is that some of the behavior is not exactly intuitive for someone coming from the C / C++ world ... or the mathematical world where number representations are not constrained by hardware considerations. – Stephen C May 16 '20 at 03:06
  • @StephenC I'm sure you're right that the semantics are precisely defined. And yeah, it's not that hard, provided you take the time to (a) learn the semantics well (in every area where it might bite you at least), and then (b) work through them for every expression and operation in your code where it might matter. :-p – LarsH May 16 '20 at 08:58