60

I'm using boto3 and trying to upload files. It will be helpful if anyone will explain exact difference between file_upload() and put_object() s3 bucket methods in boto3 ?

  • Is there any performance difference?
  • Does anyone among these handles multipart upload feature in behind the scenes?
  • What are the best use cases for both?
Tushar Niras
  • 3,654
  • 2
  • 22
  • 24
  • Can you add links to the docs for `file_upload()`? –  May 02 '17 at 13:43
  • AFAIK, file_upload() use s3transfer, which is faster for some task: http://boto3.readthedocs.io/en/latest/_modules/boto3/s3/transfer.html – mootmoot May 04 '17 at 08:57

3 Answers3

61

The upload_file method is handled by the S3 Transfer Manager, this means that it will automatically handle multipart uploads behind the scenes for you, if necessary.

The put_object method maps directly to the low-level S3 API request. It does not handle multipart uploads for you. It will attempt to send the entire body in one request.

Connor
  • 4,216
  • 2
  • 29
  • 40
garnaat
  • 44,310
  • 7
  • 123
  • 103
  • great! and then what about `put_object()` method of bucket ? – Tushar Niras May 04 '17 at 07:07
  • 1
    The ``put_object`` method maps directly to the low-level S3 API request. It does not handle multipart for you. It will attempt to send the entire body in one request. – garnaat May 04 '17 at 16:01
  • 2
    If you interrupt `upload_file` while it's doing a multipart upload, will this result in a broken upload? Because only some of the parts are uploaded and integrity checked, will S3 accept the half-uploaded file? – CMCDragonkai Apr 15 '20 at 11:18
  • per AWS documentation: "Amazon S3 never adds partial objects; if you receive a success response, Amazon S3 added the entire object to the bucket." – Alex Kir Dec 06 '21 at 14:42
6

One other difference I feel might be worth noticing is upload_file() API allows you to track upload using callback function. You can check about it here.

Also as already mentioned by boto's creater @garnaat that upload_file() uses multipart behind the scenes so its not straight forward to check end to end file integrity (there exists a way) but put_object() uploads whole file at one shot (capped at 5GB though) making it easier to check integrity by passing Content-MD5 which is already provided as a parameter in put_object() API.

Pranav Gupta
  • 651
  • 9
  • 14
4

One other thing to mention is that put_object() requires a file object whereas upload_file() requires the path of the file to upload. For example, if I have a json file already stored locally then I would use upload_file(Filename='/tmp/my_file.json', Bucket=my_bucket, Key='my_file.json').

Whereas if I had a dict within in my job, I could transform the dict into json and use put_object() like so:

records_to_update = {'Name': 'Sally'}
records_to_update_json = json.dumps(records_to_update, default=str)
put_object(Body=records_to_update_json, Bucket=my_bucket, Key='my_records')

deesolie
  • 867
  • 7
  • 17