0

In the last week I have noticed that about 5-10% of the gz files I copy down using GSUtils (now v3.42) are corrupt... When I look at the files from the GCS UI, sure enough they are larger than what GSUtils downloaded.

(FileNameHere).gz: Unexpected end of archive
(FileNameHere).gz: CRC failed in (FileNameHere). The file is corrupt

The use case is in copying gzip files from GCS down to one of our Windows Server 2008r2 machines.

Has anyone else seen this problem?

Mark
  • 538
  • 6
  • 17
  • Short of providing more details, as suggested by Mike, is it possible that your objects in GCS have both content-type and content-encoding set to gzip, but are actually only compressed once? If that was the case, the compression would be undone as part of handling http-level content-encoding, which would leave you with a local .gz file that is actually uncompressed. A fix would be to not set the content-encoding on these objects. – lot Apr 25 '14 at 16:03

2 Answers2

2

Can you please provide us with a specific example: Complete bucket & object name, the specific date/time when you downloaded the object, and the size of the file after downloading using gsutil? That way we can investigate and try to reproduce the case you're seeing.

If you'd prefer not to post the specific bucket and object names on StackOverflow you can communicate privately with the GCS team by emailing gs-team@google.com

Thanks,

Mike

Mike Schwartz
  • 11,511
  • 1
  • 33
  • 36
  • To explain this situation, we had a bash script that was attempting to download a number of large files down from GCS back to our servers. During that process, network inconsistencies were causing it to fail silently during the downloading of those files. (Fail silently at least from the perspective of calling the bash script from an SSIS package). Anyway, the next comment has an example snippet you can use to have the Bash script retry downloading those files in the event of an error. (I'd like to give Mike on the GSUtil's team credit for the suggestion... I tried it out and it worked.) – Mark May 13 '14 at 13:53
0

This snippet goes along with the comments from above (retries the copy command until successful):

#!/bin/sh

export PATH=${PATH}:/cygdrive/c/gsutil
ZIPFOLDER="d:/YourPathHere"
for obj in \
  gs://YourBucketName/YourFileName_01.gz \
  gs://YourBucketName/YourFileName_02.gz \
  gs://YourBucketName/YourFileName_03.gz \
...
  gs://YourBucketName/YourFileName_NN.gz \ ; do
    until gsutil cp $obj "$ZIPFOLDER" ; do :; done
done
Mark
  • 538
  • 6
  • 17