104

I need to list all files contained in a certain folder contained in my S3 bucket.

The folder structure is the following

/my-bucket/users/<user-id>/contacts/<contact-id>

I have files related to users and files related to a certain user's contact. I need to list both.

To list files I'm using this code:

ListObjectsRequest listObjectsRequest = new ListObjectsRequest().withBucketName("my-bucket")
                .withPrefix("some-prefix").withDelimiter("/");
ObjectListing objects = transferManager.getAmazonS3Client().listObjects(listObjectsRequest);

To list a certain user's files I'm using this prefix:

users/<user-id>/

and I'm correctly getting all files in the directory excluding contacts subdirectory, for example:

users/<user-id>/file1.txt
users/<user-id>/file2.txt
users/<user-id>/file3.txt

To list a certain user contact's files instead I'm using this prefix:

users/<user-id>/contacts/<contact-id>/

but in this case I'm getting also the directory itself as a returned object:

users/<user-id>/contacts/<contact-id>/file1.txt
users/<user-id>/contacts/<contact-id>/file2.txt
users/<user-id>/contacts/<contact-id>/

Why am I getting this behaviour? What's different beetween the two listing requests? I need to list only files in the directory, excluding sub-directories.

davioooh
  • 23,742
  • 39
  • 159
  • 250
  • 4
    This behavior would be expected if you actually created the "empty folder" in the console, because that action actually creates an empty object with the key `path/to/my/folder/` so the console has a placeholder. Did you do that, while testing? – Michael - sqlbot Jun 27 '16 at 12:47
  • @Michael-sqlbot I didn't create any empty folder. Infact all files are uploaded by the application using the folder structure I reported as prefix for the file key. – davioooh Jun 27 '16 at 12:50
  • You might want to try a `GET` on the apparent object with trailing slash, then, because if you didn't create a folder and you did use the `/` delimiter `withDelimiter("/")` when listing the objects, this should mean that you do in fact have an object named with a trailing slash, possibly due to a bug in your code that created one that way. Such an object would likely be invisible in the console. – Michael - sqlbot Jun 27 '16 at 13:02
  • 3
    Here is the code: http://codeflex.co/get-list-of-objects-from-s3-directory/ – John Detroit Apr 11 '18 at 08:19
  • Indeed Michael is right, there is an object with that key in your bucket. Run this command to remove it `aws s3api delete-object --bucket X --key path/to/my/folder/`. And make sure your code doesn't create that object again. – Sarsaparilla Jan 22 '20 at 22:00
  • One can check this if they are not able to list objects/files in specific folder https://stackoverflow.com/a/68481553/8874958 – Kishan Solanki Aug 03 '21 at 04:08

9 Answers9

74

While everybody say that there are no directories and files in s3, but only objects (and buckets), which is absolutely true, I would suggest to take advantage of CommonPrefixes, described in this answer. So, you can do following to get list of "folders" (commonPrefixes) and "files" (objectSummaries):

ListObjectsV2Request req = new ListObjectsV2Request().withBucketName(bucket.getName()).withPrefix(prefix).withDelimiter(DELIMITER);
ListObjectsV2Result listing = s3Client.listObjectsV2(req);
for (String commonPrefix : listing.getCommonPrefixes()) {
        System.out.println(commonPrefix);
}
for (S3ObjectSummary summary: listing.getObjectSummaries()) {
    System.out.println(summary.getKey());
}

In your case, for objectSummaries (files) it should return (in case of correct prefix):
users/user-id/contacts/contact-id/file1.txt
users/user-id/contacts/contact-id/file2.txt

for commonPrefixes:
users/user-id/contacts/contact-id/

Reference: https://docs.aws.amazon.com/AmazonS3/latest/API/API_ListObjectsV2.html

Adarsh Madrecha
  • 6,364
  • 11
  • 69
  • 117
Victor Kim
  • 1,647
  • 2
  • 16
  • 33
57

Everything in S3 is an object. To you, it may be files and folders. But to S3, they're just objects.

Objects that end with the delimiter (/ in most cases) are usually perceived as a folder, but it's not always the case. It depends on the application. Again, in your case, you're interpretting it as a folder. S3 is not. It's just another object.

In your case above, the object users/<user-id>/contacts/<contact-id>/ exists in S3 as a distinct object, but the object users/<user-id>/ does not. That's the difference in your responses. Why they're like that, we cannot tell you, but someone made the object in one case, and didn't in the other. You don't see it in the AWS Management Console because the console is interpreting it as a folder and hiding it from you.

Since S3 just sees these things as objects, it won't "exclude" certain things for you. It's up to the client to deal with the objects as they should be dealt with.

Your Solution

Since you're the one that doesn't want the folder objects, you can exclude it yourself by checking the last character for a /. If it is, then ignore the object from the response.

Hearen
  • 7,420
  • 4
  • 53
  • 63
Matt Houser
  • 33,983
  • 6
  • 70
  • 88
5

If your goal is only to take the files and not the folder, the approach I made was to use the file size as a filter. This property is the current size of the file hosted by AWS. All the folders return 0 in that property. The following is a C# code using linq but it shouldn't be hard to translate to Java.

var amazonClient = new AmazonS3Client(key, secretKey, region);
var listObjectsRequest= new ListObjectsRequest
            {
                BucketName = 'someBucketName',
                Delimiter = 'someDelimiter',
                Prefix = 'somePrefix'
            };
var objects = amazonClient.ListObjects(listObjectsRequest);
var objectsInFolder = objects.S3Objects.Where(file => file.Size > 0).ToList();
Nahuelgrc
  • 91
  • 1
  • 4
  • 4
    A reasonable answer although the purist in me says a file is a file even if it is zero bytes. File names can't end with a '/' and file names cannot be zero length - I think they are better decision makers than size – Oly Dungey Mar 11 '22 at 11:03
3

you can check the type. s3 has a special application/x-directory

bucket.objects({:delimiter=>"/", :prefix=>"f1/"}).each { |obj| p obj.object.content_type }
Yaroslav Malyk
  • 409
  • 5
  • 15
1

As other have already said, everything in S3 is an object. To you, it may be files and folders. But to S3, they're just objects.

If you don't need objects which end with a '/' you can safely delete them e.g. via REST api or AWS Java SDK (I assume you have write access). You will not lose "nested files" (there no files, so you will not lose objects whose names are prefixed with the key you delete)

AmazonS3 amazonS3 = AmazonS3ClientBuilder.standard().withCredentials(new ProfileCredentialsProvider()).withRegion("region").build();
amazonS3.deleteObject(new DeleteObjectRequest("my-bucket", "users/<user-id>/contacts/<contact-id>/"));

Please note that I'm using ProfileCredentialsProvider so that my requests are not anonymous. Otherwise, you will not be able to delete an object. I have my AWS keep key stored in ~/.aws/credentials file.

BartoszMiller
  • 1,245
  • 1
  • 15
  • 24
0

In AWS SDK for Java 2.x You can use this code:

ListObjectsV2Request req = ListObjectsV2Request.builder()
        .bucket("bucketName")
        .prefix(prefix)
        .delimiter("/")
        .build();

ListObjectsV2Response listing = s3Client.listObjectsV2(req);

for (S3Object object: listing.contents()) {
    System.out.println(object.key());
}

And if your prefix is some things like this Images/602cef3dd96bda7c2e97b8ad/ output will be:

Images/602cef3dd96bda7c2e97b8ad/
Images/602cef3dd96bda7c2e97b8ad/06136e02_20220501.jpg
Images/602cef3dd96bda7c2e97b8ad/0638da47_20220501.jpg
Images/602cef3dd96bda7c2e97b8ad/142e98f1_20220501.jpg
Images/602cef3dd96bda7c2e97b8ad/160ca9f1_20220501.jpg
Jamalianpour
  • 150
  • 1
  • 6
-1

S3 does not have directories, while you can list files in a pseudo directory manner like you demonstrated, there is no directory "file" per-se.
You may of inadvertently created a data file called users/<user-id>/contacts/<contact-id>/.

Magnus
  • 7,952
  • 2
  • 26
  • 52
  • I can't see any `users//contacts//` file in my management console. but, if it exists, how can I exclude this? – davioooh Jun 27 '16 at 10:57
-1

Maybe helpful for Rust aws-sdk-s3 users:

  let mut output = s3_client.list_objects_v2()
    .bucket("bucketname")
    //limit listing to bucketname/The/Parent/Object/Name
    .prefix("The/Parent/Object/Name/") 
    .send().await?;
  
  loop{
    if let Some(objects) = output.contents() {
      for object in objects {
        println!("Object: {:}", object.key().unwrap())
      }
    }
  
    if output.is_truncated() {
      output = s3_client.list_objects_v2()
               .bucket("bucketname")
               //limit listing to bucketname/The/Parent/Object/Name
               .prefix("The/Parent/Object/Name/") 
               .continuation_token(output.next_continuation_token().unwrap())
               .send().await?
    } else {
      break;
    }
  }
yerlilbilgin
  • 3,041
  • 2
  • 26
  • 21
-3

Based on @davioooh answer. This code is worked for me.

ListObjectsRequest listObjectsRequest = new ListObjectsRequest().withBucketName("your-bucket")
            .withPrefix("your/folder/path/").withDelimiter("/");
TuanDPH
  • 461
  • 5
  • 14