-1

I have a docx file at my aws-s3 bucket. I need to read it use python-docx. I write this:

from docx import Document
document = Document('https://my-first-backup-bucket-v1.s3-ap-southeast-1.amazonaws.com/New+Proposed+Quote.docx')

then, have error.. PackageNotFoundError: Package not found at 'https://my-first-backup-bucket-v1.s3-ap-southeast-1.amazonaws.com/New+Proposed+Quote.docx'

why?

when I tried to access the same file from browser it is opening successfully. for testing purpose I created this file with public access anyone can test this, can anyone please help on this?

GOPI M
  • 27
  • 7

1 Answers1

1

From Document objects — python-docx 0.8.10 documentation:

docx.Document(docx=None)

Return a Document object loaded from docx, where docx can be either a path to a .docx file (a string) or a file-like object. If docx is missing or None, the built-in default document “template” is loaded.

It is saying that the supplied filename should point to a local file. It does not say that a URL is accepted.

Therefore, you should download the file from Amazon S3, then point to it on the local file system.

John Rotenstein
  • 241,921
  • 22
  • 380
  • 470
  • Thanks for the info . But if we want to handle in aws lamda function (which is a server-less service) how can we download it? any reference – GOPI M Jan 29 '20 at 14:04
  • You can use [`download_file()`](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.Client.download_file) in the AWS SDK for Python (boto3) to download a file to `/tmp/`. Be sure to delete it when you have finished using it, since there is only 500MB of storage space provided. – John Rotenstein Jan 29 '20 at 22:05