14

Is there a way to programmatically find zero bytes file in Amazon S3?

The total size of the bucket is more than 100G,
unlikely for me to sync back to server, then do a

find . -size 0 -type f
ajreal
  • 46,720
  • 11
  • 89
  • 119

6 Answers6

16

Combining s3cmd with awk should do the trick easily.

Note: s3cmd outputs 4 columns, date, time, size and name. You want to match the size (column 3) against 0 and output the object name (column 4). This should do the trick...

$ s3cmd ls -r s3://bucketname | awk '{if ($3 == 0) print $4}'
s3://bucketname/root/
s3://bucketname/root/e

If you want to see all information, just drop the $4 so that it only says print.

$ s3cmd ls -r s3://bucketname | awk '{if ($3 == 0) print}' 
2013-03-04 06:28         0   s3://bucketname/root/
2013-03-04 06:28         0   s3://bucketname/root/e

Memory-wise, this should be fine as it's a simple bucket listing.

MeSee
  • 459
  • 6
  • 7
  • 15
    This also works with the awscli package. The syntax would be `aws s3 ls --recursive s3://bucketname | awk '{if ($3 == 0) print $4}'` – Foosh May 09 '16 at 20:43
4

There is no direct process to search files of zero bytes in size at amazon s3. You can do it by listing all objects and then sort that items on the basis of size, then you can get all zero file size together.

if you want get list of all file having size zero then you can use Bucket Explorer and list the objects of the selected bucket then click on size header (sort by size) it will keep together files size of zero byte together.

Disclosure: I am a developer of Bucket Explorer.

Tej Kiran
  • 2,218
  • 5
  • 21
  • 42
4

Just use Boto:

from boto import S3Connection
aws_access_key = ''
aws_secret_key = ''
bucket_name = ''
s3_conn = S3Connection(aws_access_key, aws_secret_key)
s3_conn.get_bucket(bucket_name)
for key in bucket.list():
    if key.size == 0:
        print(key.key)

In regards to the number files, Boto requests the file metadata (not the actual file content) at 1000 per time (the aws limit), and it uses a generator so the memory usage is minor.

TRiG
  • 10,148
  • 7
  • 57
  • 107
Derrick Petzold
  • 1,118
  • 13
  • 13
4

JMSE Query:

aws s3api list-objects --bucket $BUCKET --prefix $PREFIX --output json --query 'Contents[?Size==`0`]'
0

Finds zero length files using rudimentary pattern matching:

hdfs dfs -ls -R s3a://bucket_path/ | grep '^-' | awk -F " " '{if ($4 == 0) print $4, $7}'
rollstuhlfahrer
  • 3,988
  • 9
  • 25
  • 38
0
const getBucketFileSize = async function () {
  try {
    const response = await s3
      .listObjectsV2({
        Bucket: //Bucket-name,
        Prefix: //Provide Bucket Prefix if available,
      })
      .promise();

    response.Contents.map(item=>{
         if(item.Size===0){
            console.log(item)
         }
    })
  } catch (e) {
    console.log("err", e);
  }
};
  • You can use the listObject method available for S3 Bucket by using aws-sdk package – moham_arshed Mar 29 '22 at 09:13
  • Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Mar 29 '22 at 12:45