Downloading S3 files in Google Colab

Question

I am working on a project and it happens that some data is provided in form of S3fileSystem. I can read that data using S3FileSystem.open(path). But there are more than 360 files and it takes atleast 3 minutes to read a single file. I was wondering, is there any way of downloading these files in my system and read them from there, instead of reading it directly from S3fileSystem. There is another reason, although I can read all those files but once my session on colab reconnects I have to re-read all those files again, hence it will take a lot of time. I am using following code to read files

fs_s3 = s3fs.S3FileSystem(anon=True)
s3path = 'file_name'
remote_file_obj = fs_s3.open(s3path, mode='rb')
ds = xr.open_dataset(remote_file_obj, engine= 'h5netcdf')

Is there any way of downloading those files?

score 2 · Accepted Answer · answered May 02 '20 at 00:58

2

You can use another s3fs to mount the bucket, then copy the files to Colab.

how to mount

After mounting, you can

!cp /s3/yourfile.zip /content/

answered May 02 '20 at 00:58

korakot

37,818
16
123
144

Downloading S3 files in Google Colab

1 Answers1