0

I'm new to python and boto and I'm currently trying to write a dag that will check an s3 file size given the bucket location and file name. How can I take the file location (s3://bucket-info/folder/filename) and get the size of the file? If the file size is greater than 0kb, I will need to fail the job.

Thank you for your time

JMV12
  • 965
  • 1
  • 20
  • 52
  • Does this answer your question? [How do I get the file / key size in boto S3?](https://stackoverflow.com/questions/5315603/how-do-i-get-the-file-key-size-in-boto-s3) – Simba Feb 05 '20 at 08:45

2 Answers2

2

You can use boto3 head_object for this

Here's something that will get you the size. Replace bucket and key with your own values:

import boto3

client = boto3.client(service_name='s3', use_ssl=True)

response = client.head_object(
    Bucket='bucketname',
    Key='full/path/to/file.jpg'
)
print(response['ContentLength'])
Ninad Gaikwad
  • 4,272
  • 2
  • 13
  • 23
1

You can also get a list of all objects if multiple files need to be checked. For a given bucket run list_objects_v2 and then iterate through response 'Contents'. For example:

s3_client = boto3.client('s3')
response_contents = s3_client.list_objects_v2(
        Bucket='name_of_bucket'
        ).get('Contents')

you'll get a list of dictionaries like this:

[{'Key': 'path/to/object1', 'LastModified': datetime, 'ETag': '"some etag"', 'Size': 2600, 'StorageClass': 'STANDARD'}, {'Key': 'path/to/object2', 'LastModified': 'datetime', 'ETag': '"some etag"', 'Size': 454, 'StorageClass': 'STANDARD'}, ... ]

Notice that each dictionary in the list contains 'Size' key, which is the size of your particular object. It's iterable

for rc in response_contents:
    if rc.get('Key') == 'path/to/file':
        print(f"Size: {rc.get('Size')}")

You get sizes for all files you might be interested in:

Size: 2600
Size: 454
Size: 2600
...
Leo Skhrnkv
  • 1,513
  • 16
  • 27