1

I am trying to run tabula-py on AWS Lambda on Python3.7 environment. The code is quite straight-forward :

import tabula

def main(event, context):
    try:
        print(event['Url'])
        df = tabula.read_pdf(event['Url'])
        print(str(df))
        return {
        "StatusCode":200,
        "ResponseCode":0,
        "ResponseMessage": str(df)
    }
    except Exception as e:
        print('exception = %r' % e)
        return {
        "ResponseCode":1,
        "ErrorMessage": str(e)
    }

As you can see, there's just one real line of code having tabula.read_pdf(). I am not writing the files to anywhere yet I am getting exception as exception = OSError(30, 'Read-only file system')

FYI, the tabula details are available here

Following is what I've already tried and didn't work :

  1. Verified if the url is read correctly. Also tried by a harc-coded link in the code.
  2. Checking on Google, Stackoverflow & Co. but did not find something which can solve this issue.
  3. Removed __pycache__ directory from the ZIP before uploading it to update the code. Also ensured nothing OS-specific local directory is in the lambda deployment package.

Any help will be highly appreciated.

Sukhi
  • 13,261
  • 7
  • 36
  • 53

1 Answers1

0

tabula is writing to os, whereas you can try different pdf table scrap package for now camelot .

  • Thanks Yogesh. But the problem is - tabula has much better quality output than camelot on my PDF files. I already have camelot exposed via API. – Sukhi Feb 25 '20 at 10:32