1

I have found through various stack Q&As that a Base64 encoded 256-bit number will have one = for padding and will end only with one of AEIMQUYcgkosw048.

I'm fairly confident that a Base64 encoded 512-bit number will have two ==s of padding because of the bit quotient.

For Base64 encoded 512-bit numbers, what is the range for the final character? The modulus of the quotient of the bits is the same, so does that mean that the final character range is the same for both 256-bit encoded and 512-bit encoded?

This is for space conservation and regexing of readable Ed25519 signatures.


Specifically, I'm converting Java byte[64]s to Stringswith org.apache.commons.codec.binary.Base64's encodeBase64.

  • How are your *256-bit number* and *512-bit number* encoded? Are they encapsulated in some ASN.1 BER INTEGER envelope? Or are there merely the naked bytes? Are leading 0-bytes dropped or not? Is there a need for an additional bit to prevent signed/unsigned troubles? – mkl Jan 22 '14 at 08:36

1 Answers1

1

I am assuming here that the 256-bit and 512-bit numbers in question are encoded using exactly 32 or 64 bytes respectively (i.e. no dropping of leading zeros, no additional bit to prevent signed/unsigned issues, no ASN.1 BER encoding header, ...).

Base64 uses 4 characters for each byte triple, each character representing 6 bit of the data:

        byte #1    |    byte #2    |    byte #3
bit 7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0

becomes

bit 5 4 3 2 1 0|5 4 3 2 1 0|5 4 3 2 1 0|5 4 3 2 1 0
      char #1  |  char #2  |  char #3  |  char #4

Which char is used for which 6-tupel of bits is specified by means of a table, cf e.g. the Wikipedia article.

Thus, in case of the 256-bit number 32 bytes have to be encoded, i.e. 11 character quadruples are used the last of which only encodes 2 instead of the maximum of 3 bytes, i.e. only 16 bit of data. The last character (for which there is no data), therefore, is a =, and the second to last character (for which there only is data for the top 4 bits) can only be one representing 6-tupels of bits the two lowest bits are 0, i.e. the characters you enumerated.

And in case of the 512-bit number 64 bytes have to be encoded, i.e. 22 character quadruples are used the last of which only encodes 1 instead of the maximum of 3 bytes, i.e. only 8 bit of data. The last two characters (for which there is no data), therefore, are both =, and the second character (for which there only is data for the top 2 bits) can only be one representing 6-tupels of bits the four lowest bits are 0, i.e. the characters AQgw.

As mentioned above, though, I made certain assumptions on the encoding of the numbers...

mkl
  • 90,588
  • 15
  • 125
  • 265
  • Thank you mkl! I think you've already solved it, but just in case, please note edit. Thank you very much in advance! –  Jan 22 '14 at 14:07
  • 1
    According to your edit, you always base64-encode a `byte[64]` as is. This coincides with my assumptions for the *512-bit numbers*. – mkl Jan 22 '14 at 14:57