71

There must be an easy way to get the file size (key size) without pulling over a whole file. I can see it in the Properties of the AWS S3 browser. And I think I can get it off the "Content-length" header of a "HEAD" request. But I'm not connecting the dots about how to do this with boto. Extra kudos if you post a link to some more comprehensive examples than are in the standard boto docs.

EDIT: So the following seems to do the trick (though from looking at source code I'm not completely sure.):

bk = conn.get_bucket('my_bucket_name')
ky = boto.s3.key.Key(bk)
ky.open_read()  ## This sends a GET request. 
print ky.size

For now I'll leave the question open for comments, better solutions, or pointers to examples.

mjhm
  • 16,497
  • 10
  • 44
  • 55

6 Answers6

79

This would work:

bk = conn.get_bucket('my_bucket_name')
key = bk.lookup('my_key_name')
print key.size

The lookup method simply does a HEAD request on the bucket for the keyname so it will return all of the headers (including content-length) for the key but will not transfer any of the actual content of the key.

The S3 tutorial mentions this but not very explicitly and not in this exact context. I'll add a section on this to help make it easier to find.

Note: for every old link like http://boto.cloudhackers.com/s3_tut.html that returns a 404, add in "/en/latest" right after the ".com" : http://boto.cloudhackers.com/en/latest/s3_tut.html . (Someone needs to explore mod_rewrite...)

Community
  • 1
  • 1
garnaat
  • 44,310
  • 7
  • 123
  • 103
  • 2
    Thanks for the response and developing Boto in the first place. I'd be tearing out my hair without it. – mjhm Apr 01 '11 at 00:49
  • 1
    I would also recommend checking "if key is None" first. – meawoppl Jun 03 '15 at 19:13
  • A much better place to find the boto documentation now is https://boto.readthedocs.org/ – garnaat Jun 03 '15 at 20:38
  • 1
    This solution uses Boto 2.x In Boto3 the API is different. See [AWS Docs](https://boto3.amazonaws.com/v1/documentation/api/latest/guide/migrations3.html#accessing-a-bucket) – rvernica Nov 28 '20 at 05:11
78

in boto3:

s3.head_object also performs a HEAD request to retrieve the meta data about the object:

s3 = boto3.client('s3')
response = s3.head_object(Bucket='bucketname', Key='keyname')
size = response['ContentLength']
Kristian
  • 21,204
  • 19
  • 101
  • 176
  • This seems to produce incorrect results for me, I believe this is due to KMS encryption. Just something to be aware of. @satznova's Answer worked for me in this case. – tjheslin1 Dec 16 '20 at 15:57
27

In Boto 3:

Using S3 Object you can fetch the file (a.k.a object) size in bytes. It is a resource representing the Amazon S3 Object.

In fact you can get all metadata related to the object. Like content_length the object size, content_language language the content is in, content_encoding, last_modified, etc.

import boto3
    
s3 = boto3.resource('s3')
object = s3.Object('bucket_name','key')
file_size = object.content_length  # size in bytes; ClientError if file does not exist

Reference boto3 doc

Roelant
  • 4,508
  • 1
  • 32
  • 62
satznova
  • 520
  • 7
  • 15
  • your inline comments are not Python – openwonk Oct 23 '19 at 03:21
  • happens to all of us – openwonk Oct 23 '19 at 13:20
  • 2
    Just a note on `object` instantiation above. If `'key'` doesn't exist, `object = s3.Object('bucket_name','key')` would still work... Only when executing `file_size = object.content_length` would you get a `ClientError` exception with `404` when you try to access the actual object in the bucket. – mbadawi23 Oct 30 '21 at 00:03
25

in boto3 using an S3 resource:

boto3.resource('s3').Bucket(bucketname).Object(keyname).content_length

The head_object call of the S3 client returned me an http "403 Forbidden"

GabLeRoux
  • 16,715
  • 16
  • 63
  • 81
oyophant
  • 1,294
  • 12
  • 23
6

You can also get a list of all objects if multiple files need to be checked. For a given bucket run list_objects_v2 and then iterate through response 'Contents'. For example:

s3_client = boto3.client('s3')
response_contents = s3_client.list_objects_v2(
        Bucket='name_of_bucket'
        ).get('Contents')

you'll get a list of dictionaries like this:

[{'Key': 'path/to/object1', 'LastModified': datetime, 'ETag': '"some etag"', 'Size': 2600, 'StorageClass': 'STANDARD'}, {'Key': 'path/to/object2', 'LastModified': 'datetime', 'ETag': '"some etag"', 'Size': 454, 'StorageClass': 'STANDARD'}, ... ]

Notice that each dictionary in the list contains 'Size' key, which is the size of your particular object. It's iterable

for rc in response_contents:
    print(f"Size: {rc.get('Size')}")

You get sizes for all files you might be interested in:

Size: 2600
Size: 454
Size: 2600
...
Leo Skhrnkv
  • 1,513
  • 16
  • 27
1

A lot of fancy answers but a pretty simple answer(which works for sure) where neither you have to perform HEAD Request nor create a resource representing the Amazon S3 Object.

Read the s3 object in the bucket properly (else object.size might not work), and use .size to read the size metadata of the file/key.

import boto3
s3 = boto3.resource("s3")
bucket = s3.Bucket(AWS_S3_BUCKET)
//prefix is the path following bucket_name
obj=bucket.objects.filter(Prefix='prefix')
for key in obj:
    file_size=round(key.size*1.0/1024, 2)
    print(file_size)