0

I downloaded a cloud trail GZ file from my S3 bucket, but when i tried to unzip it using the freshly downloaded 7zip, the error message says "cannot open file xxx as archive". I tried to save the log in different S3 buckets. I deleted the trail and recreated it. I tried stop the trail then download it. It is always the same. What's going on?

John Rotenstein
  • 241,921
  • 22
  • 380
  • 470
  • if you use command line then simply use `unzip` command with your file name. – Amit Jul 26 '17 at 03:59
  • 1
    How did you download the .gz file? I've noticed that when I download it via the S3 Management Console, it is already unzipped. Take a look at the file in a text editor and see whether it appears as normal JSON content. – John Rotenstein Jul 26 '17 at 04:23
  • Thank you John! You are right. It is already unzipped. –  Jul 26 '17 at 04:24

1 Answers1

3

My testing has shown that downloading the .gz file within the Amazon S3 console actually saves an unzipped version of the file. Notice that the file extension will present the gzipped file, rather than being of type .gz.

I think this is because browsers can handle HTML files compressed in .gz format, so they natively decompress such files.

John Rotenstein
  • 241,921
  • 22
  • 380
  • 470
  • Broswers don't automatically handle `.gz` *files* but they do automatically handle `Content-Encoding: gzip`. This is unfortunately incorrect design behavior on the part of CloudTrail, but it's a mistake I see people make: the objects are stored in S3 with `Content-Encoding: gzip`. It is fundamentally incorrect to do both that *and* give the file a `.gz` extension **unless** the object has been gzipped *twice*, which there is of course no reason to ever do. For a `.gz` file to be *downloadable as a `.gz` file*, `Content-Encoding` is blank and `Content-Type` is `application/gzip` or similar. – Michael - sqlbot Jul 26 '17 at 17:27
  • ...in the current situation, the files should have been created with only a `.json` extension and the `Content-Type` and `Content-Encoding` set as they are now, because the gzipping is not of the *file* per se but rather of the HTTP *representation* of the entity -- a JSON document. Adding `.gz` creates a significant and avoidable ambiguity. – Michael - sqlbot Jul 26 '17 at 17:31