How to deploy a Scrapy spider on Heroku cloud

Question

I developed few spiders in scrapy & I want to test those on Heroku cloud. Does anybody have any idea about how to deploy a Scrapy spider on Heroku cloud?

scrapy-heroku was written specifically for this purpose: http://pypi.python.org/pypi/scrapy-heroku — Pablo Hoffman, Nov 21 '12 at 13:46

score 13 · Accepted Answer · edited Sep 08 '22 at 06:32

Yes, it's fairly simple to deploy and run your Scrapy spider on Heroku.

Here are the steps using a real Scrapy project as example:

Clone the project (note that it must have a requirements.txt file for Heroku to recognize it as a Python project):

git clone https://github.com/scrapinghub/testspiders.git
Add cffi to the requirement.txt file (e.g. cffi==1.1.0).
Create the Heroku application (this will add a new heroku git remote):

heroku create
Deploy the project (this will take a while the first time, when the slug is built):

git push heroku main
Run your spider:

heroku run scrapy crawl followall

Some notes:

Heroku disk is ephemeral. If you want to store the scraped data in a persistent place, you can use a S3 feed export (by appending -o s3://mybucket/items.jl) or use an addon (like MongoHQ or Redis To Go) and write a pipeline to store your items there
It would be cool to run a Scrapyd server on Heroku, but it's not currently possible because the sqlite3 module (which Scrapyd requires) doesn't work on Heroku
If you want a more sophisticated solution for deploying your Scrapy spiders, consider setting up your own Scrapyd server or using a hosted service like Scrapy Cloud

I believe 'heroku run' starts up a One-Off Dyno which will result in costing more money. Is this the only option? — elgehelge, Mar 25 '14 at 04:16
@Helge one off dynos don't cost any more per minute than standard dynos. — Daniel Naab, Jun 20 '14 at 22:41
You can use scrapy-heroku to run a Scrapyd server on heroku! Has been working great for me. https://github.com/dmclain/scrapy-heroku — Arctelix, Nov 15 '14 at 08:12
command to run your spider should be: `heroku run scrapy crawl --app ` — huynq9, Apr 18 '17 at 03:41

How to deploy a Scrapy spider on Heroku cloud

1 Answers1

Linked