What algorithm BigInteger use to convert integer to byte array

Question

It seems that Java has a very efficient way of converting Integer to byteArray using BigInteger.toByteArray() method, it is optimal in terms of space.

So for integers in the range of 0-128, it will only use 1 byte, and for integers in the range of 2147483648 - 549755813888 it will use exactly 5 bytes.

What is the algorithm/idea they used to implement this?

Ralf Kleberhoff · Answer 1 · 2018-01-27T14:58:45.473

There's no real algorithm necessary. It's just a matter of copying the existing bits in 8-bit chunks into the target byte array.

The internal representation (* see below) is two's complement binary with infinite length (but only storing the significant lower part, not the infinitely many zeroes or ones). And the same representation is used in the resulting byte array.

Only the internal storage uses 32-bit integer chunks, and the byte array 8-bit chunks.

(*) Of course, a compliant BigInteger class could use whatever internal representation they liked, but the 32-bit integer one is very plausible.

Answering the OP's comment, I'd like to explain with a little more detail what's going on between BigInteger and byte array.

What is a Java int?

Let's start with a simple int. It's a 32-bit entity encoding integer numbers from the range -2147483648 to 2147483647 (-2^31 to 2^31 - 1) using "two's complement". Some examples both in binary and in hex:

-2147483648 is encoded as 1000 0000 0000 0000 0000 0000 0000 0000 or 0x80000000.
        -16 is encoded as 1111 1111 1111 1111 1111 1111 1111 0000 or 0xfffffff0.
         -2 is encoded as 1111 1111 1111 1111 1111 1111 1111 1110 or 0xfffffffe.
         -1 is encoded as 1111 1111 1111 1111 1111 1111 1111 1111 or 0xffffffff.
          0 is encoded as 0000 0000 0000 0000 0000 0000 0000 0000 or 0x00000000.
          1 is encoded as 0000 0000 0000 0000 0000 0000 0000 0001 or 0x00000001.
         16 is encoded as 0000 0000 0000 0000 0000 0000 0001 0000 or 0x00000010.
 2147483647 is encoded as 0111 1111 1111 1111 1111 1111 1111 1111 or 0x7fffffff.

Remember, one hex digit represents 4 bits:

Negative numbers start with a 1 bit, positive with a 0, and if the numbers are small, they start with lots of 1 bits or 0 bits. The limitation with primitive types like short, int, or long is that they can only represent some given range of numbers.

What's a BigInteger?

If we had infinitely many bits available for our number representation, then we could represent any number, no matter how large. That's the BigInteger model, just with the one improvement that we don't repeat the lots of leading 1 or 0 bits, but just the interesting trailing part.

The BigInteger class (at least one plausible implementation) represents numbers by arrays of integers, holding only the interesting part of the bit representation. Some examples (using only hex now, as line width is limited...):

                   0 [0x00000000]
          2147483647 [0x7fffffff]
          2147483648 [0x00000000, 0x80000000]
          4294967296 [0x00000001, 0x00000000]
          8589934592 [0x00000002, 0x00000000]
     281474976710655 [0x0000ffff, 0xffffffff]
     281474976710656 [0x00010000, 0x00000000]
18446744073709551616 [0x00000001, 0x00000000, 0x00000000]
    -281474976710656 [0xffff0000, 0x00000000]

Extracting the byte array

Creating the byte array just means splitting the integers every 8 bits (= 2 hex digits)

Let's start with an example:

8589934592 [0x00 00 00 02, 0x00 00 00 00]
bytes:     [           02,   00,00,00,00]

So, it's:

Knowing that the first three 00 bytes can be omitted. In our example it's because the first integer starts with at least 25 bits of 0.
Copying the bits from the integers to the bytes.

Without the leading-zeroes test, the byte array would be 8 bytes long, just creating four bytes per integer.

So, the only "algorithm" part of creating the byte array is checking for some number of leading 0 bits (or 1 bits in case of a negative number).

user207421 · Answer 2 · 2018-01-26T16:28:25.490

0

The algorithm is the standard technique for multiprecision radix conversion: repeated division by the target radix and storage of the remainder on each iteration.

See for example Knuth, D.E., The Art of Computer Programming, Vol II.

edited Jan 26 '18 at 16:28

answered Jan 26 '18 at 16:09

user207421

305,947
44
307
483

Can you please provide some official source link – Ilya Gazman Jan 26 '18 at 16:10
Define 'official source link'. I have already provided the most 'official' *citation* possible. – user207421 Jan 26 '18 at 16:20
Yeah, your mention is good, I commented before you added it. Let me just go over the article to find what is relevant to my question and understand what is it that you are talking about lol. – Ilya Gazman Jan 26 '18 at 16:23

What algorithm BigInteger use to convert integer to byte array

2 Answers2