I am new to all things Heroku/Django so apologies if this question is elementary. I've developed my first minor app using Django and have just deployed it into Heroku. A major part of the app is that I have a scraper that scrapes about 150K records and dumps the data into my Heroku Postgres database...it takes about 2.5 hours to run - ideally, I'd like this to run every other day but for now, that is another issue.
As of right now I have this scraper in a management command in my project and when I want to run it I manually use:
heroku run python manage.py scrape_data
I inspected my logs while this was running and noticed it is using a one-off-dyno as opposed to a worker which I read was better for long-running jobs like mine.
My question is, how can I change my Procfile (or something else) so that Heroku knows to use a worker dyno whenever I execute this task? If it's something I can add to the Heroku scheduler that'd be great to solve, though from what I understand those exclusively use one-off dynos so that is probably out of the question.
Here's my current profile:
web: gunicorn project.wsgi --log-file -
I've tried reading some Heroku docs on queuing, background jobs, etc but I haven't really be able to find something that would help me. I'm wondering if it's as easy as adding something like the below to my Procfile? Then when I run the command Heroku would know to use a worker dyno to execute it?
worker: scrape_data.py
My current project structure:
Project
-> scraper_app
-> Management
-> Commands
-> init.py
-> scrape_data.py
-> home_page
-> views.py
-> Procfile
-> requirements.txt
-> manage.py
-> runtime.txt