Unable to unzip very big files from/to google buckets when mounted with gcsfuse

Question

On Google Cloud I have a linux Compute Engine and a bucket. I have mounted the bucket as a drive to the CE using gcsfuse - as recommended by Google - and from time to time I have had a big 7zip archive (tens of GBs) uploaded to the bucket. When I log into the CE's terminal, go to the mounted bucket folder and try to unzip the file (in the same location) using the command: 7z x myarchive.7z it will unzip the file up to 100% (which takes a couple of minutes) and at the end it will fail with:

ERROR: E_FAIL

Archives with Errors: 1

After that if I look at the bucket's content the unzipped file name is present, however it has 0 KB.

I understand that E_FAIL is normally associated with lack of space, but the Google bucket is supposed to have unlimited space (with restrictions to single file sizes). The command df -h for example says that the mounted bucket is supposed to have 1 Petabytes of available storage.

Anyone out there with a similar setup / problem?

Have you tried to move the file to the GCE instance disk and unzip it there? (just to check if the problem is related to the bucket mount or not). Also, what's the capacity of the default GCE VM disk? Maybe unzipping produces temporary files too big for that VM disk. — norbjd, Apr 15 '19 at 13:56
@norbjd I am currently in the process of transferring the file to a remote machine where I can unzip it. The VM's disk is pretty low, around 10 GBs, Unzipping goes to 100% so I presume that the whole process got to the end somewhere in some "memory". I'm executing this command from a location in the mounted bucket and by doing this I'm reasoning any space being reserved happens there as well. Edit: There is also the fact that in the future there will be archives several TB big and the whole process should hold together. — Vee6, Apr 15 '19 at 14:00
I'm not sure that running the command from the mounted directory ensures that all files created by the unzip process will be located in that particular directory (temporary files for example). Moreover, the filesystem of the mounted directory is not a classical FS, so it is subject to some restrictions (there is no random writes for example on GCS). That's why the unzip process may need to use local storage to perform some tasks. — norbjd, Apr 15 '19 at 14:14
From the docs (https://cloud.google.com/storage/docs/gcs-fuse#notes), it is stated that : "*Random writes are done by reading in the whole blob, editing it locally, and writing the whole modified blob back to Cloud Storage. Small writes to large files work as expected, but are slow and expensive.*". The "*editing it locally*" part may be the cause of your problem. Could you try to attach a disk with a bigger capacity (enough to hold all files in your archive) and re-run the unzip process? — norbjd, Apr 15 '19 at 14:14
I resized the size of the main disk to allow sufficient space and it seems it worked out in the end. After the unzipping was at 100% it took a couple more minutes I presume to move the file from the temp folder on the hdd to the bucket. If you post your solution as an answer I will upvote it. — Vee6, Apr 16 '19 at 11:03

score 2 · Accepted Answer · answered Apr 16 '19 at 11:32

As suggested in the comments, the unzipping process may require some specific operations on the local filesystem, even if you are issuing the command from the mounted directory.

Indeed, since GCS-fuse mounted filesystem is not a classical FS, some operations may require transfers to the local disk (this is the case for random writes for example, see the docs) :

Random writes are done by reading in the whole blob, editing it locally, and writing the whole modified blob back to Cloud Storage. Small writes to large files work as expected, but are slow and expensive.

To ensure that the unzipping process have enough available size to work, and assuming that temporary files are probably created during the process, you should increase the capacity of your local disk.

Unable to unzip very big files from/to google buckets when mounted with gcsfuse

1 Answers1