1

When I try to use HTTPCACHE with scrapyd I get the following error:

[scrapy] WARNING: Disabled Httpcache Middlware: unable to find scrapy.cfg file to infer project data dir

Acorn
  • 49,061
  • 27
  • 133
  • 172

1 Answers1

3

The problem is that http caching defaults to using the relative path httpcache.

This works when you run the spider normally from the command line, but not when running it as a service.

The solution is to set the HTTPCACHE_DIR[docs] setting to an absolute path.

robasta
  • 4,621
  • 5
  • 35
  • 53
Acorn
  • 49,061
  • 27
  • 133
  • 172
  • ie: absolute path on the server... so yes, even though the spider is a client that is deployed to a server, the client tells the scrapyd service where on the server's filesystem to put the cache. The dir must be writable by the scrapyd process. – John Mee Jun 19 '12 at 08:28
  • 1
    Still a relavant answer, but the link to the docs is now [this one](https://doc.scrapy.org/en/latest/topics/downloader-middleware.html#httpcache-dir). – bartaelterman Apr 06 '17 at 07:05