1

I'm working with a system that returns a GZIP string (not a binary or stream) in its response. For example, gzip:H4sIAAAAAAAAALS9665wyXEd9i7zm21U37vnVwwZAREEAZJYgBEjMKq6uiVLNimQlBDRyLtnrX3GD+ACIlEDznyaffbpXV21Vl1W/bdf/vKv/3R/+fWXf/4n17/cX373y/l7/cPfXf+3f/nl1zxWrmW2Vfeev/vlL/p3f/7l1//4f//uF7vvj3+6f/unP/zy63/75T/9p//+CPz97375F/0v//zb3/2q/1X/+sc... (had to omit the full string as its very long) . I'm able to verify its GZIP compressed since tools such as https://www.multiutil.com/gzip-to-text-decompress/ are returning the expected uncompressed string.

However I'm stuck trying to find a way to handle this string in Java.

I've tried

final GZIPInputStream gzipInput = new GZIPInputStream(new ByteArrayInputStream(compressedString.getBytes()));

but this line is throwing java.util.zip.ZipException: Not in GZIP format.

I've searched around here but the similar posts are regarding when GZIP is in a http response and can be read from the stream. In my case, my GZIP data is already given to me as a string.

Any pointers would be greatly appreciated, thank you.

SW Williams
  • 559
  • 1
  • 5
  • 18
  • 2
    I could be wrong, but isn't that a base64 encoded string? Then you'd have to base64 decode it. – ewokx Apr 13 '23 at 04:31
  • Sorry bad example, the actual string is very long so I just used the same tool I used to decompress the string to encode "Hello World", but apparently it also Base64 encodes it. I edited my post to include part of the actual GZIP string – SW Williams Apr 13 '23 at 04:41
  • Turns out my original string was indeed base64 encoded. The prefix threw me off. Thanks for the help! – SW Williams Apr 13 '23 at 05:34

1 Answers1

4

This:

H4sIAAAAAAAACvNIzcnJVwjPL8pJAQBWsRdKCwAAAA==

Is Base64. Which is an encoding that lets you put bytes in character form, such that the characters survive just about any and all text-only mediums.

It is extremely inefficient, inflating the size of your data: Every 3 bytes requires 4 characters here (so, at least 4 bytes in transfer).

What you probably have here is data that went through this process:

  • First, GZip this data.
  • Second, Base64 the gzipped data.

Which is weird - first, efficiently store it, then, send it very inefficiently. It makes sense only if it's in a medium that can't handle bytes (such as basic JSON, or HTTP headers), and you might want to think about that medium then.

At any rate, to get back to the original data, apply the same steps, in reverse:

  • First, de-base64 that, which gives you a byte array.
  • Next, make that the basis for a ByteArrayInputStream and toss it at GZIPInputStream.

Base64 is baked into java.

byte[] compressedData = Base64.getDecoder().decode("H4sIAAAAAAAACvNIzcnJVwjPL8pJAQBWsRdKCwAAAA=="));
var gz = new GZIPInputStream(new ByteArrayInputStream(compressedData));
rzwitserloot
  • 85,357
  • 5
  • 51
  • 72
  • Sorry I gave a bad example, the actual string is very long so I just used the same tool I used to decompress the string to encode "Hello World", but apparently it also Base64 encodes it. I edited my post to include part of the actual GZIP string – SW Williams Apr 13 '23 at 04:41
  • Although actually it turns out my original string was indeed Base64 encoded... Thanks for the help – SW Williams Apr 13 '23 at 05:10