0

There are files from an AWS s3 bucket that I would like to download, they all have the same name but are in different subfolders. There are no credentials required to download and connect to this bucket. I would like to download all the files called "B01.tif" in s3://sentinel-cogs/sentinel-s2-l2a-cogs/7/V/EG/, and save them with the name of the subfolder they are in (for example: S2A_7VEG_20170205_0_L2AB01.tif).

Path example:

s3://sentinel-cogs/sentinel-s2-l2a-cogs/7/V/EG/2017/2/S2A_7VEG_20170205_0_L2A/B01.tif

I was thinking of using a bash script that prints the output of ls to download the file with cp, and save it on my pc with a name generated from the path.

Command to use ls:

aws s3 ls s3://sentinel-cogs/sentinel-s2-l2a-cogs/7/V/EG/2017/2/ --no-sign-request

Command to download a single file:

aws s3 cp s3://sentinel-cogs/sentinel-s2-l2a-cogs/7/V/EG/2017/2/S2A_7VEG_20170205_0_L2A/B01.tif --no-sign-request B01.tif

Attempt to download multiple files:

VAR1=B01.tif
for a in s3://sentinel-cogs/sentinel-s2-l2a-cogs/7/V/EG/:    
  for b in s3://sentinel-cogs/sentinel-s2-l2a-cogs/7/V/EG/2017/:
    for c in s3://sentinel-cogs/sentinel-s2-l2a-cogs/7/V/EG/2017/2/:
    
       NAME=$(aws s3 ls s3://sentinel-cogs/sentinel-s2-l2a-cogs/7/V/EG/$a$b$c | head -1)
       
       aws s3 cp s3://sentinel-cogs/sentinel-s2-l2a-cogs/7/V/EG/$NAME/B01.tif --no-sign-request $NAME$VAR1
    
    done
  done
done

I don't know if there is a simple way to go automatically through every subfolder and save the files directly. I know my ls command is broken, because if there are multiple subfolders it will only take the first one as a variable.

Nihilum
  • 549
  • 3
  • 11

1 Answers1

0

It's easier to do this in a programming language rather than as a Shell script.

Here's a Python script that will do it for you:

import boto3

BUCKET = 'sentinel-cogs'
PREFIX = 'sentinel-s2-l2a-cogs/7/V/EG/'
FILE='B01.tif'

s3_resource = boto3.resource('s3')

for object in s3_resource.Bucket(BUCKET).objects.filter(Prefix=PREFIX):
    if object.key.endswith(FILE):
        target = object.key[len(PREFIX):].replace('/', '_')
        object.Object().download_file(target)
John Rotenstein
  • 241,921
  • 22
  • 380
  • 470
  • I get the following error message: "NoCredentialsError: Unable to locate credentials" Do you know how I could use your script with no credentials ? This s3 bucket is supposed to be accessible even without an AWS account (https://registry.opendata.aws/sentinel-2-l2a-cogs/) I was using the "--no-sign-request" specification with the aws cli because I do not have a AWS account yet. Is it mandatory to use boto3 ? – Nihilum Oct 12 '21 at 23:52
  • 1
    Try some of these methods: [Can I use boto3 anonymously?](https://stackoverflow.com/q/34865927/174777) – John Rotenstein Oct 13 '21 at 00:04
  • Thank you very much !! For those who want to do the same thing without credentials, add: from botocore import UNSIGNED from botocore.client import Config from botocore.handlers import disable_signing s3 = boto3.client('s3', config=Config(signature_version=UNSIGNED)) s3_resource = boto3.resource('s3') s3_resource.meta.client.meta.events.register('choose-signer.s3.*', disable_signing) – Nihilum Oct 13 '21 at 00:44