how does a number change when it is too long for the selected datatype in java

Question

For example of this is my input:

byte x=(byte) 200;

This will be the output: -56

if this is my input:

short x=(short) 250000;

This will be the output: -12144

I realize that the output is off because the number does not fit into the datatype, but how can I predict what this output will be in this case? In my computer science exam this my be one of the questions and I do not understand why exactly 200 changes to -56 and so one.

Read about [Integer Overflow](https://en.wikipedia.org/wiki/Integer_overflow) and [2's compliment](https://en.wikipedia.org/wiki/Two%27s_complement) — Bohemian, Nov 06 '22 at 21:11
If a number doesn't fit into a data type, it overflows, or wraps around. So you subtract the type's size from the value until you get a value that fits into the type. 200 is too big for a signed byte so you subtract 256, which gives you -56 — QBrute, Nov 06 '22 at 21:11
The fact that you use casts turns off compiler/runtime checks and assumes you know what you are doing - but then at some point in time one has to learn what all this means. — Queeg, Nov 06 '22 at 23:22

score 2 · Answer 1 · answered Nov 06 '22 at 21:12

The relevant aspects are what overflow looks like, and how the bits that represent the underlying data are treated.

Computers are all bits, grouped together in groups of 8; a group of 8 bits is called a byte.

byte b = 5; for example, is stored in memory as 0000 0101.

Bits can be 0. Or 1. That's it. That's where it ends. And everything is, in the end, bits. This means: That - is not a thing. Computers do not know what - is and cannot store it. We need to write code and agree on some sort of meaning to represent them.

2's complement

So what's -5 in bits? It's 1111 1011. Which seems bizarre. But it's how it works. If you write: byte b = -5;, then b will contain 1111 1011. It is because javac made that happen. Similarly, if you then call System.out.println(b), then the println method gets the bit sequence 1111 1011. Why does the println method decide to print a - symbol and then a 5 symbol? Because it's programmed that way: We all are in agreement that 1111 1011 is -5. So why is that?

Because of a really cool property - signed/unsigned irrelevancy.

The rule is 2's complement: To switch the sign (i.e. turn 5, which is 0000 0101 into -5 which is 1111 1011), you flip every bit, and then add 1 to the end result. Try it with 0000 0101 - and you'll see it's 1111 1011. This algorithm is reversible - apply the same algorithm (flip every bit, then add 1) and you can turn -5 into 5.

This 2's complement thing has 2 great advantages:

There is only one 0 value. If we just flipped all bits, we'd have both 1111 1111 and 0000 0000 both representing some form of 0. In basic math, there's no such thing as 'negative 0' - it's the same as positive 0. Similarly if we just decided the first bit is the sign and the remaining 7 bits are the number, then we'd have both 1000 0000 and 0000 0000 both being 0, which is annoying and inefficient, why waste 2 different bit sequences on the same number?
plus and minus are sign-mode independent. The computer doesn't have to KNOW whether we are doing the 2's complement thing or not. Take the bit sequence 1111 1011. If we treat that as unsigned bits, then that is 251 (it's 128 + 64 + 32 + 16 + 8 + 2 + 1). If we treat that as a signed number, then the first bit is 1, so the thing is negative: We apply 2's complement and figure out that it is -5. So, is it -5 or 251? It's both, at once! Depends on the human/code that interprets this bit sequence which one it is. So how could the computer possibly do a + b given this? The weird answer is: It doesn't matter - because the math works out the same way. 251 - 10 is 241. -5 - 10 is -15. -15 and 241 are the exact same bit sequence.

Overflow

A byte is 8 bits, and there are 256 different sequences of bits, and then you have listed each and every possible variant. (2^8 = 256. Hence, a 16-bit number can be used to convey 65536 different things, because 2^16 is 65536, and so on). So, given that bytes are 8 bits and we decreed they are signed, and 2's complement signed, that means that the smallest number you can send with it is -128, which in bits is 1000 0000 (use 2's complement to check my work), and +127, which in bits is 0111 1111. So what happens if you add 1 to 127? That'd seemingly be +128 except that's not storable in 8 bits if we decree that we interpret these bits as 2's complement signed (which java does). What happens? The bits 'roll over'. We just add 1 as normal, which turns 0111 1111 into 1000 0000 which is -128:

byte b = 127;
b = (byte)(b + 1);
System.out.println(b); // prints -128

Imagine the number line - stretching out into infinity on both ends, from -infinite to +infinite. That's the usual way math works. Computers (or rather, int, long, etc) do not work like that. Instead of a line, it is a circle. Take your infinite number line and take some scissors, and snip that number line at -128 (because a 2's comp signed byte cannot represent -129 or anything else below -128), and at +127 (because our byte cannot represent 128 or anything above it).

And now tape the 2 cut ends together.

That's the number line. What's 'to the right' of 125? 126 - that's what +1 means: Move one to the right on the number line.

What's 'to the right' of +127? Why, -128. Because we taped it together.

Similarly, -127 - 5 is +123. '-5' is 'move 5 places to the left on the number line (or rather, number circle)'. Going in 1 decrements:

-127 (we start here)
-128 (-127 -1)
+127 (-127 -2)
+126 (-127 -3)
+125 (-127 -4)
+124 (-127 -5)

Hence, 124.

Same math applies to short (-32768 to +32767), char (which is really a 16-bit unsigned number - so 0 to 65535), int (-2147483648 to +2147483647), and even long (-2^63 to +2^63-1 - those get a little large).

short x = 32765;
x += 5;
System.out.println(x); // prints -32766.

how does a number change when it is too long for the selected datatype in java

1 Answers1

2's complement

Overflow