I'm working on implementing GZIP compression for interactions between some of our systems. The systems are written in both Java and C#, so GZIP streams were used on both sides since they have standard library support.
On the C# side, everything works up to and including our biggest test files (70MB uncompressed), however we run into issues with Java running out of heap space. We've tried increasing the heap size to capacity for the IDE, but the issue is still not resolved.
I've taken some steps to try and optimize the Java code, but nothing seems to keep the data from piling up in the heap. Is there a good way to handle this? Below is a subset of my current (working on smaller streams) solution.
EDIT: Following code modified with recommendations from @MarkoTopolnik. With changes, 17 million characters are read before crash.
public static String decompress(byte[] compressed, int size)
{
GZIPInputStream decompresser;
BufferedReader reader;
char buf[] = new char[(size < 2048) ? size : 2048];
Writer ret = new StringWriter( buf.length );
decompresser = new GZIPInputStream( new ByteArrayInputStream( compressed ), buf.length );
reader = new BufferedReader( new InputStreamReader( decompresser, "UTF-8" ) );
int charsRead;
while( (charsRead = reader.read( buf, 0, buf.length )) != -1 )
{
ret.write( buf, 0, charsRead );
}
decompresser.close();
reader.close();
return ret.toString();
}
The code dies after hitting a little over 7.6 million chars in the ArrayList
and the stack trace indicates that the ArrayList.add()
call is the cause (fails after triggering the internal array to be expanded).
With the edited code above, a call to AbstractStringBuilder.expandCapacity()
is what kills the program.
Is there a less memory-expensive way to implement a dynamic array or some completely different approach I can use to get a String from the decompressed stream? Any suggestions would be greatly appreciated!