How can I get the list of only folders in amazon S3 using python boto?

Question

I am using boto and python and amazon s3.

If I use

[key.name for key in list(self.bucket.list())]

then I get all the keys of all the files.

mybucket/files/pdf/abc.pdf
mybucket/files/pdf/abc2.pdf
mybucket/files/pdf/abc3.pdf
mybucket/files/pdf/abc4.pdf
mybucket/files/pdf/new/
mybucket/files/pdf/new/abc.pdf
mybucket/files/pdf/2011/

what is the best way to

1. either get all folders from s3
2. or from that list just remove the file from the last and get the unique keys of folders

I am thinking of doing like this

set([re.sub("/[^/]*$","/",path) for path in mylist]

I think your solution it is the best one, I can get exactly the file path directly. — Flavio, Jul 31 '19 at 15:56

score 49 · Answer 1 · answered Oct 17 '13 at 18:13

49

building on sethwm's answer:

To get the top level directories:

list(bucket.list("", "/"))

To get the subdirectories of files:

list(bucket.list("files/", "/")

and so on.

answered Oct 17 '13 at 18:13

j1m

616
6
9

4

That's great and the docs certainly led me in that direction, but I don't seem to get a list of keys. Instead I get a list with a key and a `boto.s3.prefix.Prefix()` object, which I don't really know what do do with. Any ideas? – brice Feb 09 '15 at 16:42
1

bucket.list does yield a list of prefix objects. The `name` attribute is probably what you're looking for. – Evan Muehlhausen Sep 23 '15 at 21:14
1

it's important to note that to get the directories, the `prefix` (first parameter) should end with the delimiter – Ciprian Tomoiagă Feb 27 '17 at 15:53

perpetual_check · Answer 2 · 2015-07-15T15:11:12.387

This is going to be an incomplete answer since I don't know python or boto, but I want to comment on the underlying concept in the question.

One of the other posters was right: there is no concept of a directory in S3. There are only flat key/value pairs. Many applications pretend certain delimiters indicate directory entries. For example "/" or "\". Some apps go as far as putting a dummy file in place so that if the "directory" empties out, you can still see it in list results.

You don't always have to pull your entire bucket down and do the filtering locally. S3 has a concept of a delimited list where you specific what you would deem your path delimiter ("/", "\", "|", "foobar", etc) and S3 will return virtual results to you, similar to what you want.

http://docs.aws.amazon.com/AmazonS3/latest/API/RESTBucketGET.html ( Look at the delimiter header.)

This API will get you one level of directories. So if you had in your example:

mybucket/files/pdf/abc.pdf
mybucket/files/pdf/abc2.pdf
mybucket/files/pdf/abc3.pdf
mybucket/files/pdf/abc4.pdf
mybucket/files/pdf/new/
mybucket/files/pdf/new/abc.pdf
mybucket/files/pdf/2011/

And you passed in a LIST with prefix "" and delimiter "/", you'd get results:

mybucket/files/

If you passed in a LIST with prefix "mybucket/files/" and delimiter "/", you'd get results:

mybucket/files/pdf/

And if you passed in a LIST with prefix "mybucket/files/pdf/" and delimiter "/", you'd get results:

mybucket/files/pdf/abc.pdf
mybucket/files/pdf/abc2.pdf
mybucket/files/pdf/abc3.pdf
mybucket/files/pdf/abc4.pdf
mybucket/files/pdf/new/
mybucket/files/pdf/2011/

You'd be on your own at that point if you wanted to eliminate the pdf files themselves from the result set.

Now how you do this in python/boto I have no idea. Hopefully there's a way to pass through.

Why is `new/abc.pdf` listed with delimiter '/' in the second example with prefix `mybucket/files/pdf/`. I suppose with delimiter '/' its an inner object and shouldn't be listed. @sethwm — xtreak, Jul 15 '15 at 05:26

score 20 · Answer 3 · answered Jul 29 '15 at 14:16

20

As pointed in one of the comments approach suggested by j1m returns a prefix object. If you are after a name/path you can use variable name. For example:

import boto
import boto.s3

conn = boto.s3.connect_to_region('us-west-2')
bucket = conn.get_bucket(your_bucket)

folders = bucket.list("","/")
for folder in folders:
    print folder.name

answered Jul 29 '15 at 14:16

Wawrzek

452
5
18

If you want to get all of your buckets you can wrap the above in a buckets = conn.get_all_buckets and then for bucket in buckets: and then continue with the bucket.list.... e.g. >>> buckets = S3Connection().get_all_buckets() >>> for bucket in buckets: ... for folder in bucket.list(): ... print folder.name – cgseller Jul 30 '15 at 21:27

score 19 · Answer 4 · edited Jul 06 '21 at 17:20

19

I found the following to work using boto3:

import boto3
def list_folders(s3_client, bucket_name):
    response = s3_client.list_objects_v2(Bucket=bucket_name, Prefix='', Delimiter='/')
    for content in response.get('CommonPrefixes', []):
        yield content.get('Prefix')

s3_client = boto3.client('s3')
folder_list = list_folders(s3_client, bucket_name)
for folder in folder_list:
    print('Folder found: %s' % folder)

Refs.:

edited Jul 06 '21 at 17:20

LucyDrops

539
5
15

answered Jun 19 '19 at 21:24

Eduardo Sztokbant

752
6
16

2

I could get this working by changing `session.client()` to `boto3.client()` – Saurav Panda Jul 03 '19 at 20:02
Delimiter option helped me to get objects only in root directory and skipping folder that are created. – Terminator Dec 06 '19 at 14:10
1

Does this really answers the question ? It will NOT print out all the folders in bucket, but only the first root level from the prefix. – Anum Sheraz Apr 02 '20 at 12:12
2

Just change `Prefix=''` to whatever the prefix you care about, and it'll print the folders at that level – Kyle Barron Dec 13 '20 at 22:45
How could this be modified to find subfolders as well? – Prithvi Boinpally Apr 03 '23 at 22:25

score 11 · Answer 5 · answered Jun 29 '13 at 08:09

Basically there is no such thing as a folder in S3. Internally everything is stored as a key, and if the key name has a slash character in it, the clients may decide to show it as a folder.

With that in mind, you should first get all keys and then use a regex to filter out the paths that include a slash in it. The solution you have right now is already a good start.

Erica Jh Lee · Answer 6 · 2017-04-06T19:19:12.520

I see you have successfully made the boto connection. If you only have one directory that you are interested in (like you provided in the example), I think what you can do is use prefix and delimiter that's already provided via AWS (Link).

Boto uses this feature in its bucket object, and you can retrieve a hierarchical directory information using prefix and delimiter. The bucket.list() will return a boto.s3.bucketlistresultset.BucketListResultSet object.

I tried this a couple ways, and if you do choose to use a delimiter= argument in bucket.list(), the returned object is an iterator for boto.s3.prefix.Prefix, rather than boto.s3.key.Key. In other words, if you try to retrieve the subdirectories you should put delimiter='\' and as a result, you will get an iterator for the prefix object

Both returned objects (either prefix or key object) have a .name attribute, so if you want the directory/file information as a string, you can do so by printing like below:

from boto.s3.connection import S3Connection

key_id = '...'
secret_key = '...'

# Create connection
conn = S3Connection(key_id, secret_key)

# Get list of all buckets
allbuckets = conn.get_all_buckets()
for bucket_name in allbuckets:
    print(bucket_name)

# Connet to a specific bucket
bucket = conn.get_bucket('bucket_name')

# Get subdirectory info
for key in bucket.list(prefix='sub_directory/', delimiter='/'):
    print(key.name)

Whilst this code snippet is welcome, and may provide some help, it would be [greatly improved if it included an explanation](//meta.stackexchange.com/q/114762) of *how* and *why* this solves the problem. Remember that you are answering the question for readers in the future, not just the person asking now! Please [edit] your answer to add explanation, and give an indication of what limitations and assumptions apply. — Toby Speight, Apr 06 '17 at 17:15
@TobySpeight , I added some additional information. Thank you for your comment. — Erica Jh Lee, Apr 06 '17 at 19:21

score 3 · Answer 7 · answered Jan 20 '17 at 16:31

The issue here, as has been said by others, is that a folder doesn't necessarily have a key, so you have to search through the strings for the / character and figure out your folders through that. Here's one way to generate a recursive dictionary imitating a folder structure.

If you want all the files and their url's in the folders

assets = {}
  for key in self.bucket.list(str(self.org) + '/'):
    path = key.name.split('/')

    identifier = assets
  for uri in path[1:-1]:
    try:
      identifier[uri]
    except:
      identifier[uri] = {}
    identifier = identifier[uri]

    if not key.name.endswith('/'):
      identifier[path[-1]] = key.generate_url(expires_in=0, query_auth=False)

return assets

If you just want the empty folders

folders = {}
  for key in self.bucket.list(str(self.org) + '/'):
    path = key.name.split('/')

    identifier = folders
  for uri in path[1:-1]:
    try:
      identifier[uri]
    except:
      identifier[uri] = {}
    identifier = identifier[uri]

    if key.name.endswith('/'):
      identifier[path[-1]] = {}

return folders

This can then be recursively read out later.

score 0 · Answer 8 · answered Jul 01 '13 at 04:09

the boto interface allows you to list the content of a bucket and give a prefix of the entry. That way you can have the entry for what would be a directory in a normal filesytem :

import boto
AWS_ACCESS_KEY_ID = '...'
AWS_SECRET_ACCESS_KEY = '...'

conn = boto.connect_s3(AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY)
bucket = conn.get_bucket()
bucket_entries = bucket.list(prefix='/path/to/your/directory')

for entry in bucket_entries:
    print entry

This lists all directories and files in that prefic path – RexFuzzle May 19 '16 at 13:33 — RexFuzzle, May 19 '16 at 13:33

score -1 · Answer 9 · answered Nov 27 '17 at 08:55

Complete example with boto3 using the S3 client

import boto3


def list_bucket_keys(bucket_name):
    s3_client = boto3.client("s3")
    """ :type : pyboto3.s3 """
    result = s3_client.list_objects(Bucket=bucket_name, Prefix="Trails/", Delimiter="/")
    return result['CommonPrefixes']


if __name__ == '__main__':
    print list_bucket_keys("my-s3-bucket-name")

How can I get the list of only folders in amazon S3 using python boto?

9 Answers9

Linked