3

I have a bunch of strings in the DB which were encoded with the sun.misc.BASE64Encoder a while ago.

Now I wanna decode (and encode further strings) with java.util.Base64.

The difference between these two is that the Sun one added a new line string at each n characters

Example:

Sun Base64:   54y49568uyj304j534w5y
              34y0639j6yh93j5h0653j
              s45hr68o

JDK8 Base64:  54y49568uyj304j534w5y34y0639j6yh93j5h0653js45hr68o

In order for the JDK decoder to parse these zipped strings, I would need to remove the new line characters.

Questions:

  1. Do I remove \r\n (Unix) or \n (Windows) or \r (old Macs)? Because the strings depend on which machine they were encoded

  2. If I say zippedString.replaceAll("\r", "").replaceAll("\n", "") how can I make sure that I won't have a \r chacter in the actual string, resulting in corrupted data?

  3. Is there any other way to create a bridge between these two mechanisms?

Georgian
  • 8,795
  • 8
  • 46
  • 87
  • 2
    There is no `\r\n` in a base 64 encoded string so you can probably remove them (it depends on the encoding process, for instance, one might encode word by word or line by line), see https://tools.ietf.org/html/rfc4648#section-4 –  Mar 30 '16 at 11:18
  • https://en.wikipedia.org/wiki/Base64: no `\r` or `\n` – Thomas Mar 30 '16 at 11:20
  • 1
    Have you tried using the Java 8 mime decoder? It is documented to ignore newlines. – JB Nizet Mar 30 '16 at 11:37

2 Answers2

5

There is no white space in Base64 so I would remove the all.

String toDecode = str.replaceAll("\\s+", "");

This removes any ambiguity on how to handle specific newlines or spaces.

Peter Lawrey
  • 525,659
  • 79
  • 751
  • 1,130
1

The purpose of ‘\r’ and ‘\n’ characters here is purely related to text formatting and you can assume they’re invisible.

The code that reads the string value should take care of filtering them out, whatever method is used e.g.

read line -> trim -> concatenate

Then, decode the concatenated string.

As a digression, suppose a different encoding algorithm was applied with these characters being part of the encoding alphabet. Because Java interprets ‘\r’ and ‘\n’ characters as an indicator of a new line (depending on the operating system), the string representation of the encoded value would need to be escaped, e.g “14y6\\n75b….”. However, this would’ve caused more problems than benefits hence should be avoided.

RZet
  • 914
  • 9
  • 11