27

I can grab and read all the objects in my AWS S3 bucket via

s3 = boto3.resource('s3')
    bucket = s3.Bucket('my-bucket')
    all_objs = bucket.objects.all()
    for obj in all_objs:
        pass
        #filter only the objects I need

and then

obj.key

would give me the path within the bucket.

Is there a way to filter beforehand for only those files respecting a certain starting path (a directory in the bucket) so that I'd avoid looping over all the objects and filtering later?

mar tin
  • 9,266
  • 23
  • 72
  • 97

3 Answers3

58

Use the filter[1], [2] method of collections like bucket.

s3 = boto3.resource('s3')
bucket = s3.Bucket('my-bucket')
objs = bucket.objects.filter(Prefix='myprefix')
for obj in objs:
    pass
Ilja Everilä
  • 50,538
  • 7
  • 126
  • 127
  • 5
    Does this require pagination like `list_objects` does, or does it return unlimited results? – LondonRob Apr 21 '20 at 17:28
  • 1
    It would seem that `filter()` handles that for you behind the scenes: https://boto3.amazonaws.com/v1/documentation/api/latest/guide/collections.html#filtering. Given its API that is kind of a given; how else could it be just a generator yelding results? – Ilja Everilä Apr 21 '20 at 17:31
7

For folks using boto3.client('s3') rather than boto3.resource('s3'), you can use the 'Prefix' key to filter out objects in the s3 bucket

import boto3

s3 = boto3.client('s3')

params = {
    "Bucket": "HelloWorldBucket",
    "Prefix": "Happy"
}

happy_objects = s3.list_objects_v2(**params)

The above code snippet will fetch all files in the 'Happy' folder in the 'HelloWorldBucket'.

PS: folder in s3 is just a construct and is implemented as a prefix to the file/object name.

Gru
  • 817
  • 13
  • 20
2

If we just need list of object-keys then, bucket.objects.filter is a better alternative to list_objects or list_object_v2, as those functions have limit of 1000 objects. Reference: list_objects_v2

Lavesh
  • 169
  • 7
  • 1
    Not different from existing answer. – NVS Abhilash Aug 14 '20 at 20:54
  • 5
    I am not allowed to add comments, hence, submitted it in the answer format in support of the existing answer and to add little clarity about limitation of list_objects/_v2. I am sure, it will help others. – Lavesh Aug 14 '20 at 22:56