5

Using Scrapy with amazon S3 is fairly simple, you set:

  • FEED_URI = 's3://MYBUCKET/feeds/%(name)s/%(time)s.jl'
  • FEED_FORMAT = 'jsonlines'
  • AWS_ACCESS_KEY_ID = [access key]
  • AWS_SECRET_ACCESS_KEY = [secret key]

and everything works just fine.

But Scrapyd seems to override that setting and saves the items on the server (with a link in the web site)

Adding the "items_dir =" setting doesn't seem to change anything.

What kind of setting makes it work?

EDIT: Extra info that might be relevant - we are using Scrapy-Heroku.

arikg
  • 402
  • 2
  • 4
  • 17
  • do you see anything in scrapyd logs? Does it save items on S3 if you run your crawler directly via `scrapy crawl`? How did you tell scrapyd where your project `settings` file is? – alecxe Apr 13 '13 at 19:24
  • Nothing in the logs as far as I can see. It does save to S3 when I do `scrapy crawl` (This tells me the S3 configuration is fine) and I just put the settings in the default location (I know it reads it well because I have the `application` setting there - which works fine) – arikg Apr 14 '13 at 07:12

2 Answers2

1

I also faced the same problem. Removing the items_dir= from scrapyd.conf file worked for me.

Eric Aya
  • 69,473
  • 35
  • 181
  • 253
0

You can set the items_dir property to an empty value like this:

[scrapyd]
items_dir=

It seems that when that property is set, takes precedence over the configured exported. See http://scrapyd.readthedocs.org/en/latest/config.html for more information.

César Izurieta
  • 413
  • 5
  • 10