-2

I want to calculate size of every s3 buckets and generate result like this

Bucket_name total size
('bucket_A ', 0)
('Bucket_B', 51090)

This is what I try so far:

import boto3 
total_size = 0 
s3=boto3.resource('s3') 
for mybucket in s3.buckets.all(): 
  mybucket_size=sum([object.size for object in boto3.resource('s3').Bucket(mybucket.name).objects.all()]) 
print (mybucket.name, mybucket_size)
mootmoot
  • 12,845
  • 5
  • 47
  • 44
Rockinroll
  • 394
  • 6
  • 11
  • `import boto3 total_size = 0 s3=boto3.resource('s3') for mybucket in s3.buckets.all(): mybucket_size=sum([object.size for object in boto3.resource('s3').Bucket(mybucket.name).objects.all()]) print (mybucket.name, mybucket_size)` – Rockinroll May 14 '19 at 13:07
  • If you need the storage info for monitoring purpose, use S3 inventory services. https://docs.aws.amazon.com/AmazonS3/latest/dev/storage-inventory.html#storage-inventory-how-to-set-up – mootmoot May 14 '19 at 13:28

2 Answers2

1

Amazon CloudWatch automatically collects metrics on Amazon S3, including BucketSizeBytes:

The amount of data in bytes stored in a bucket in the STANDARD storage class, INTELLIGENT_TIERING storage class, Standard - Infrequent Access (STANDARD_IA) storage class, OneZone - Infrequent Access (ONEZONE_IA), Reduced Redundancy Storage (RRS) class, or Glacier (GLACIER) storage class. This value is calculated by summing the size of all objects in the bucket (both current and noncurrent objects), including the size of all parts for all incomplete multipart uploads to the bucket.

See: Monitoring Metrics with Amazon CloudWatch - Amazon Simple Storage Service

John Rotenstein
  • 241,921
  • 22
  • 380
  • 470
0

Just iterate over all objects and count the size

sum([object.size for object in boto3.resource('s3').Bucket('mybucket').objects.all()])

EDIT:

If you want it to be faster you would have to use different approach, the method above is making HTTP request for every object in your bucket so obviously it scales linearly with the amount of files in the bucket. This cannot be sped up unfortunately.

You can, however, use third-party scripts like s4cmd which is faster compared to the first approach.

s4cmd du s3://bucket-name

Or use -r if you want to include size of subdirectiories

s4cmd du -r s3://bucket-name
Josef Korbel
  • 1,168
  • 1
  • 9
  • 32
  • `import boto3 total_size = 0 s3=boto3.resource('s3') for mybucket in s3.buckets.all(): mybucket_size=sum([object.size for object in boto3.resource('s3').Bucket(mybucket.name).objects.all()]) print (mybucket.name, mybucket_size)` – Rockinroll May 14 '19 at 13:07
  • help me to improve the code so that it fetch result fast. and reponse time should be low – Rockinroll May 14 '19 at 13:09