Python cloudpathlib download all files with suffix(csv) from AWS s3

Question

I am using cloudpathlib to download bulk files from AWS s3. My question is how can I download only csv files using this package?

from cloudpathlib import CloudPath

root_dir = CloudPath('s3://dir1/dir2/*.csv')
root_dir.download_to(to_path)

In s3://dir1/dir2/ , there are csv and pdf files. Note that I am not in a position to use boto3 and I generally wouldn’t prefer loops.

Thanks.

score 1 · Answer 1 · answered Oct 29 '21 at 05:26

1

Try this:

root_dir = CloudPath('s3://dir1/')
for f in root_dir.glob('dir2/*.csv'):
    filename = f.name.replace('/', '_') 
    f.download_to_filename(filename)

This assumes that your bucket is named dir1 and that the files are in dir2/ folder.

answered Oct 29 '21 at 05:26

kgiannakakis

103,016
27
158
194

Thanks. Is there a way not to use the loop? – pnna Oct 29 '21 at 06:38
`glob` returns a list. I don't think that there is a way to avoid using for or an equivalent structure. – kgiannakakis Oct 29 '21 at 08:25

Python cloudpathlib download all files with suffix(csv) from AWS s3

1 Answers1