5

I wanted to use Base64.java to encode and decode files. Encode.wrap(InputStream) and decode.wrap(InputStream) worked but runned slowly. So I used following code.

public static void decodeFile(String inputFileName,
        String outputFileName)
        throws FileNotFoundException, IOException {

    Base64.Decoder decoder = Base64.getDecoder();
    InputStream in = new FileInputStream(inputFileName);
    OutputStream out = new FileOutputStream(outputFileName);

    byte[] inBuff = new byte[BUFF_SIZE];  //final int BUFF_SIZE = 1024;
    byte[] outBuff = null;
    while (in.read(inBuff) > 0) {
        outBuff = decoder.decode(inBuff);
        out.write(outBuff);
    }
    out.flush();
    out.close();
    in.close();
}

However, it always throws

Exception in thread "AWT-EventQueue-0" java.lang.IllegalArgumentException: Input byte array has wrong 4-byte ending unit
    at java.util.Base64$Decoder.decode0(Base64.java:704)
    at java.util.Base64$Decoder.decode(Base64.java:526)
    at Base64Coder.JavaBase64FileCoder.decodeFile(JavaBase64FileCoder.java:69)
    ...

After I changed final int BUFF_SIZE = 1024; into final int BUFF_SIZE = 3*1024;, the code worked. Since "BUFF_SIZE" is also used to encode file, I believe there were something wrong with the file encoded (1024 % 3 = 1, which means paddings are added in the middle of the file).

Also, as @Jon Skeet and @Tagir Valeev mentioned, I should not ignore the return value from InputStream.read(). So, I modified the code as below.

(However, I have to mention that the code does run much faster than using wrap(). I noticed the speed difference because I had coded and intensively used Base64.encodeFile()/decodeFile() long before jdk8 was released. Now, my buffed jdk8 code runs as fast as my original code. So, I do not know what is going on with wrap()... )

public static void decodeFile(String inputFileName,
        String outputFileName)
        throws FileNotFoundException, IOException
{

    Base64.Decoder decoder = Base64.getDecoder();
    InputStream in = new FileInputStream(inputFileName);
    OutputStream out = new FileOutputStream(outputFileName);

    byte[] inBuff = new byte[BUFF_SIZE];
    byte[] outBuff = null;
    int bytesRead = 0;
    while (true)
    {
        bytesRead = in.read(inBuff);
        if (bytesRead == BUFF_SIZE)
        {
            outBuff = decoder.decode(inBuff);
        }
        else if (bytesRead > 0)
        {
            byte[] tempBuff = new byte[bytesRead];
            System.arraycopy(inBuff, 0, tempBuff, 0, bytesRead);
            outBuff = decoder.decode(tempBuff);
        }
        else
        {
            out.flush();
            out.close();
            in.close();
            return;
        }
        out.write(outBuff);
    }
}

Special thanks to @Jon Skeet and @Tagir Valeev.

Leo.W
  • 539
  • 1
  • 7
  • 18

4 Answers4

5

I strongly suspect that the problem is that you're ignoring the return value from InputStream.read, other than to check for the end of the stream. So this:

while (in.read(inBuff) > 0) {
    // This always decodes the *complete* buffer
    outBuff = decoder.decode(inBuff);
    out.write(outBuff);
}

should be

int bytesRead;
while ((bytesRead = in.read(inBuff)) > 0) {
    outBuff = decoder.decode(inBuff, 0, bytesRead);
    out.write(outBuff);
}

I wouldn't expect this to be any faster than using wrap though.

Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194
1

Try to use decode.wrap(new BufferedInputStream(new FileInputStream(inputFileName))). With buffering it should be at least as fast as your manually crafted version.

As for why your code doesn't work: that's because the last chunk is likely to be shorter than 1024 bytes, but you try to decode the whole byte[] array. See the @JonSkeet answer for details.

Tagir Valeev
  • 97,161
  • 19
  • 222
  • 334
0

Well, I changed

"final int BUFF_SIZE = 1024;"

into

"final int BUFF_SIZE = 1024 * 3;"

It worked!

So, I guess probabaly there is something wrong with padding... I mean, when encoding the file, (since 1024 % 3 = 1) there must be paddings. And those might raise problems when decoding...

Leo.W
  • 539
  • 1
  • 7
  • 18
  • 2
    This could cause the encoding problems (as base64 encodes every 3 bytes into 4), but should not cause the decoding problems (in decoding every 4 bytes are converted to 3). Probably now your problem is just more hidden (for example, it might not report exception, but produce silently incorrect result). – Tagir Valeev Sep 30 '15 at 06:07
0
  • You should records the number of bytes you have read, beside this,
  • You should be sure that your buffer size is divisible for 3, cause in Base64, every 3 bytes have four output(64 is 2^6, and 3*8 equals 4*6), by doing this, you can avoid padding problems.( In this way your output will not have the wrong ending of "=")
boileryao
  • 21
  • 4