What is the best multi-part base 64 encoder in java?

Question

I have tested different base64 encoders mig64,iHarder,sun etc. Seems like these need to have the whole data need to be in memory for conversion.

If I want to encode a large file (stream) > 1gb in a multi-threaded fashion, which codec implementation can be used without corrupting the file? commons codec seems to have the base64outputstream wrapper. any other solutions?

To make it clear, I have a 1TB file, and this file need to be encoded base64. Machine memory 2GB RAM, what is the fastest way to do it in Java?

So your particular definition of "best" is "capable of encode a stream"? — Thorbjørn Ravn Andersen, Apr 14 '11 at 18:38
Do you have other criteria? As it is currently written, this question is subjective (from the title) and/or a "list of X" request (based on the last part). — Pops, Apr 14 '11 at 18:40
in a concurrent fashion.... let it be file ie. fixed stream of bytes — zudokod, Apr 14 '11 at 18:41
ok i meant if i have to encode a bigger file criteria will be size vs performance, ie n Gbs/ hour — zudokod, Apr 14 '11 at 18:42
@hGx, your description is very vague - consider rewording your question more towards a specification. — Thorbjørn Ravn Andersen, Apr 14 '11 at 19:11
If you are willing to test some more implementations, could you also compare the one I just put up at https://github.com/jhorstmann/Base64 ? — Jörn Horstmann, Apr 14 '11 at 19:55
interesting, do you have a benchmark like mig64 did? http://migbase64.sourceforge.net/ — zudokod, Apr 14 '11 at 20:49
hGx: Just did some benchmarks, for encoding from byte array to string mine was a bit faster than commons codec, mig64 was about 3 times faster. For file or stream based operations things should look different. — Jörn Horstmann, Apr 15 '11 at 00:50

score 1 · Accepted Answer · answered Apr 14 '11 at 18:48

1

I'm not sure which encoder is faster offhand, you'll have to measure each to determine that. However you can avoid the memory problem and accomplish the concurrency by splitting the file into chunks. Just make sure you split them on some 6-byte boundary (since it evenly turns into 8 bytes in Base64).

I'd recommend picking a reasonable chunk size and using an ExecutorService to manage a fixed number of threads to do the processing. You can share a RandomAccessFile between them and write to the appropriate places. You'll of course have to calculate the output chunk offsets (just multiple by 8 and divide by 6).

Honestly though you might not realize much performance gain here with concurrency. It could just overwhelm the hard drive with random access. I'd start with chunking the file up using a single thread. See how fast that is first. You can probably crunch a 1GB file faster than you think. As a rough guess I'd say 1 minute on modern hardware, even writing to the same drive you're reading from.

answered Apr 14 '11 at 18:48

WhiteFang34

70,765
18
106
111

how to ensure the integrity, say after line breaks after 76 characters etc? – zudokod Apr 14 '11 at 18:51
I wouldn't split it on line breaks, you'll need to split on a fixed byte boundary. If you read line by line then you can't guarantee that each line is a multiple of 6 bytes. – WhiteFang34 Apr 14 '11 at 18:54
i meant writing... the output should have,by spec,line breaks after 76 for larger chunks. ie File is converted to another file having characters, will have line breaks after 76 characters according to the specification – zudokod Apr 14 '11 at 18:58
Ah, I see. You need a chunk size that produces full 76 character lines. Then you can calculate the destination offset. For example 3648 input characters will produce 4864 output characters in Base64. That's 64 lines of output. Assuming that you have 2 bytes for a CRLF at the end of each line that adds another 128 bytes of output. So for each 3648 byte input chunk you'll get a 4992 byte output chunk. Just write to the correct offset in the file for the chunk you're processing. – WhiteFang34 Apr 14 '11 at 19:06

What is the best multi-part base 64 encoder in java?

1 Answers1