Downloading a temporary file in Heroku and then reading it

Question

I'm trying to download a PDF from a site and then read it, all in a single python script running on a single worker dyno in Heroku. However, my script requires that file be temporarily stored in the ephemeral filesystem in order to be read.

From the documentation, this should be possible:

Each dyno gets its own ephemeral filesystem, with a fresh copy of the most recently deployed code. During the dyno’s lifetime its running processes can use the filesystem as a temporary scratchpad, but no files that are written are visible to processes in any other dyno and any files written will be discarded the moment the dyno is stopped or restarted.

Yet no matter what I do, it seems to throw an error which is similar to what I get when I run it on my local machine and the file does not exist (the script otherwise runs fine on the local machine).

See the relevant part of my code below, I am using Tabula to process the PDF into a CSV.

Another point to note is when checking the filesize in Heroku it returns the correct value, so the file has been downloaded and is in the file system, but cannot be read by the Tabula wrapper for some reason.

#urllib.urlretrieve(url[, filename[, reporthook[, data]]])
urllib.urlretrieve(url, 'downloaded.pdf')

#check if pdf downloaded by checking file size
filesize = os.path.getsize('downloaded.pdf')
print filesize  # this returns the correct value

#if pdf was downloaded correctly then convert info to csv
if (filesize > 30000):
    tabula.convert_into("downloaded.pdf", # error at this line
                            "downloaded.csv",
                            pages="all",
                            output_format="csv")
else:
    print ('404 error')
    sys.exit

My question is similar to this question, except I am running the script on a single dyno, which should make it possible.

Have you tried storing the file using [`tempfile`](https://docs.python.org/3.6/library/tempfile.html)? That should make your script more platform-independent. — Hewbot, Jun 22 '17 at 14:56
Unfortunately storing the file using tempfile still didn't solve it. Still getting the same error :( — Kyap, Jun 26 '17 at 02:33
After further investigation the problem appears to be with the Tabula wrapper. Reading and opening a file (even without using tempfile) for use in another library appears to work fine. — Kyap, Jun 28 '17 at 05:22
What's the exact error you are seeing? For temporary files you can use the local dyno filesystem without any issues normally. — Denis Cornehl, Aug 02 '20 at 07:12

Downloading a temporary file in Heroku and then reading it

0 Answers0