4

In AWS, I'm trying to save a file to S3 in Python using a Lambda function. While this works on my local computer, I am unable to get it to work in Lambda. I've been working on this problem for most of the day and would appreciate help. Thank you.

def pdfToTable(PDFfilename, apiKey, fileExt, bucket, key):

    # parsing a PDF using an API
    fileData = (PDFfilename, open(PDFfilename, "rb"))
    files = {"f": fileData}
    postUrl = "https://pdftables.com/api?key={0}&format={1}".format(apiKey, fileExt)
    response = requests.post(postUrl, files=files)
    response.raise_for_status()

    # this code is probably the problem!
    s3 = boto3.resource('s3')
    bucket = s3.Bucket('transportation.manifests.parsed')
    with open('/tmp/output2.csv', 'rb') as data:
        data.write(response.content)
        key = 'csv/' + key
        bucket.upload_fileobj(data, key)

    # FYI, on my own computer, this saves the file
    with open('output.csv', "wb") as f:
        f.write(response.content)

In S3, there is a bucket transportation.manifests.parsed containing the folder csv where the file should be saved.

The type of response.content is bytes.

From AWS, the error from the current set-up above is [Errno 2] No such file or directory: '/tmp/output2.csv': FileNotFoundError. In fact, my goal is to save the file to the csv folder under a unique name, so tmp/output2.csv might not be the best approach. Any guidance?

In addition, I've tried to use wb and w instead of rb also to no avail. The error with wb is Input <_io.BufferedWriter name='/tmp/output2.csv'> of type: <class '_io.BufferedWriter'> is not supported. The documentation suggests that using 'rb' is the recommended usage, but I do not understand why that would be the case.

Also, I've tried s3_client.put_object(Key=key, Body=response.content, Bucket=bucket) but receive An error occurred (404) when calling the HeadObject operation: Not Found.

johndoe
  • 4,387
  • 2
  • 25
  • 40
tskittles
  • 126
  • 1
  • 2
  • 9
  • 1
    You have `open('/tmp/output2.csv', 'rb')` but you are trying to write to the file. Note you probably don't have to create a temporary file. The bucket has a [`put_object`](http://boto3.readthedocs.io/en/latest/reference/services/s3.html#S3.Bucket.put_object) method you can use. – Alasdair Mar 07 '18 at 23:50
  • just updated the post to reflect responses to your comment. thoughts? – tskittles Mar 08 '18 at 00:01
  • You need to fix the problem that @Alasdair indicated. You're attempting to open a non-existent file for *reading* hence you get FileNotFoundError). You need to open it for writing. Plus CSV files are text files, not binary files, so "wt" (or just "w" because text is the default) would typically be more appropriate. – jarmod Mar 08 '18 at 00:31
  • @jarmod When I switch it to 'w', the error is `Input <_io.TextIOWrapper name='/tmp/output2.csv' mode='w' encoding='UTF-8'> of type: is not supported` – tskittles Mar 08 '18 at 00:43
  • 1
    You need to use `'w'` or `wb` to *write* the file. The docs you link to are for uploading that file, which is a separate step. You haven't shown enough information to know why `put_object` failed. You already have the bucket so I would do `bucket.put_object(Key=key, Body=response.content)`. If that doesn't work you should show the complete code you tried, and the full traceback. – Alasdair Mar 08 '18 at 10:19
  • @tskittles I am running into same problem. Here's my SO link: https://stackoverflow.com/questions/68915908/save-image-data-from-a-iterator-object-to-aws-s3-in-python – kms Aug 25 '21 at 03:55

2 Answers2

6

Assuming Python 3.6. The way I usually do this is to wrap the bytes content in a BytesIO wrapper to create a file like object. And, per the boto3 docs you can use the-transfer-manager for a managed transfer:

from io import BytesIO
import boto3
s3 = boto3.client('s3')

fileobj = BytesIO(response.content)

s3.upload_fileobj(fileobj, 'mybucket', 'mykey')

If that doesn't work I'd double check all IAM permissions are correct.

abigperson
  • 5,252
  • 3
  • 22
  • 25
  • I am trying to write an Avro file to S3. I am using DataFileWriter from Avro package. Let me if I could do that without having to use a temp file. – Minerva Sep 09 '21 at 13:02
  • Sorry I am not familiar with Avro. You could post this as a new question and I am sure it'll get some better attention that way! – abigperson Sep 10 '21 at 15:41
4

You have a writable stream that you're asking boto3 to use as a readable stream which won't work.

Write the file, and then simply use bucket.upload_file() afterwards, like so:

s3 = boto3.resource('s3')
bucket = s3.Bucket('transportation.manifests.parsed')
with open('/tmp/output2.csv', 'w') as data:
    data.write(response.content)

key = 'csv/' + key
bucket.upload_file('/tmp/output2.csv', key)
jarmod
  • 71,565
  • 16
  • 115
  • 122
  • For case with concurrent invocations of the lambda - won't it create collisions using same '/tmp/output2.csv'? – Alex_Y Jun 28 '22 at 14:33
  • 1
    @Alex_Y No, concurrent Lambda function invocations do not use the same runtime environment. There may, however, be a leftover file in /tmp from a previous Lambda function invocation so the function should take that into account (e.g. delete or overwrite any existing file, or simply create a uniquely-named file). – jarmod Jun 28 '22 at 14:39