byte array gzip and base64 encoding results in OOM error upon retrieval and decode+unzip at high load

Question

We have an XML document of size 1.4MB which we gzipCompress and encode to Base64 and save in cosmos. Upon receiving some updates, we read cosmos, decode from base64 and unzip to get the original string. What we are observing is at some high load the slanted apostrophe character is creating junk data while saving in cosmos upon update processing. the base64 encoded data looks like - /F9nYk3vKlhqHb65KybqXTJfLvTvuy24HFwOq1wOT55oEkdJ+0bmcuWJJisvbfanpsb7//2//w8AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA........

and decoding,unzip this gives OOM with size growing into GB with this character � like �� Logic for encoding and gzip

    String compressed;
    try (var baos = new ByteArrayOutputStream(); var gzipOut = new GZIPOutputStream(baos);) {
      gzipOut.write(data.getBytes(StandardCharsets.UTF_8));
      gzipOut.close();
      compressed = new String(Base64.getEncoder().encode( baos.toByteArray()));
    } catch (IOException e) {
      throw new FOSInjestorApplicationException(Errors.UNEXPECTED_ERROR
              , Errors.UNEXPECTED_ERROR.getDescription());
    }

Logic for decoding and unzip

    byte[] decodebase64 = Base64.getDecoder().decode(arr);
    byte[] gzip;
    try (var bais = new ByteArrayInputStream(arr); var gzip = new GZIPInputStream(bais);) {
      gzip = gzip.readAllBytes();
    } catch (IOException e) {
      throw new FOSInjestorApplicationException(Errors.UNEXPECTED_ERROR
              , Errors.UNEXPECTED_ERROR.getDescription());
    }
    return new String(gzip);

When we place this same document with slanted apostrophe in non-prod, its working fine.

I am using java 11 and java cosmos 4.x SDK What could cause this to fail at high load?

We tried to process too many updates (1 at a time) on a document which had special character - slanted apostrophe and the update should not corrupt the data but we found this junk character after decoding/unzip - �� which was ever growing in size into 1 GB and give OOM

score 0 · Answer 1 · answered Jul 28 '23 at 06:53

So, upon checking why decoding&decompressing then compressing&encoding was creating an issue with this slanted apostrophe, we found out that JAVA11 uses default UTF-16 file encoding and our application VM uses some ISO-8859-1 file encoding. So these encodings dont understand the tilted apostrophe and when we do zip-unzip too many times the size of junk characters ?????? is exponentially increasing with each update and finally if the decompressed&decoded size grows till 1GB fopr a 2MB junk payload saved in cosmos, application sees OOM error. The fix was to add -Dfile.encoding=UTF-8 JVM parameter explicitly and observed that the special character(tilted apostrophe) was getting correctly read into apostrophe '

byte array gzip and base64 encoding results in OOM error upon retrieval and decode+unzip at high load

1 Answers1