6

I try to open a .pptx from Amazon S3 and read it using the python-pptx library. This is the code:

from pptx import Presentation
import boto3
s3 = boto3.resource('s3')

obj=s3.Object('bucket','key')
body = obj.get()['Body']
prs=Presentation((body))

It gives "AttributeError: 'StreamingBody' object has no attribute 'seek'". Shouldn't this work? How can I fix this? I also tried using read() on body first. Is there a solution without actually downloading the file?

  • Which line is throwing the error? Also, it is impossible to display the file without downloading it - either to disk or memory. – fredrik Jan 25 '21 at 18:31

1 Answers1

9

To load files from S3 you should download (or use stream strategy) and use io.BytesIO to transform your data as pptx.Presentation can handle.

import io
import boto3

from pptx import Presentation

s3 = boto3.client('s3')
s3_response_object = s3.get_object(Bucket='bucket', Key='file.pptx')
object_content = s3_response_object['Body'].read()

prs = Presentation(io.BytesIO(object_content))

ref:

Just like what we do with variables, data can be kept as bytes in an in-memory buffer when we use the io module’s Byte IO operations. journaldev

bcosta12
  • 2,364
  • 16
  • 28
  • How would you then save/write the file back to s3? – blue_chip Feb 24 '21 at 16:07
  • @blue_chip https://boto3.amazonaws.com/v1/documentation/api/1.9.185/guide/s3-uploading-files.html – bcosta12 Feb 24 '21 at 19:34
  • 1
    But from what I understand, `prs` is not a file or file-like object so I don't see how I can use it with `upload_file()`. It's this conversion from Presentation object to file-like object that I'm lost with. Btw, thanks for replying! – blue_chip Feb 24 '21 at 20:08
  • @blue_chip You can make the reverse steps. Use the BytesIO to create a buffer and use that to send to S3. Take a look at ---> https://stackoverflow.com/questions/48525338/how-to-upload-stream-to-aws-s3-with-python – bcosta12 Feb 24 '21 at 20:30