I should pass to a spider some parameters taken from a json file. I have read that it is possible through scrapyd using schedule.json but I don't understand how to pass the json file. Someone of you have any experience?
Asked
Active
Viewed 1,469 times
2 Answers
8
You don't pass the arguments using a JSON file. Scrapyd has a JSON API where you can pass arguments along with it. (e.g. $ curl http://localhost:6800/schedule.json -d project=myproject -d spider=somespider -d myargument="value"
)
You can handle the arguments passed through kwargs
:
class MySpider(Spider):
name = 'somespider'
def __init__(self, *args, **kwargs):
super(MySpider, self).__init__(*args, **kwargs)
self.myargument = kwargs.get('myargument', '')
See http://scrapyd.readthedocs.org/en/latest/api.html for more info.

marven
- 1,836
- 1
- 17
- 14
-
I have seen it but I don't understand where to pass the json file as argument of the spider – eng_mazzy Jul 08 '14 at 12:08
-
2You can't pass a file per se. Closest thing you could do is pass the path of the file. (e.g. `-d /path/to/file`) and have your spider handle that somewhere in its code. – marven Jul 09 '14 at 01:15
-
If I'm hosting my scrapyd instance on an AWS EC2 linux instance and I **needed** to pass a json file, how would I go about that? http://stackoverflow.com/questions/42284726/input-output-for-scrapyd-instance-hosted-on-an-amazon-ec2-linux-instance – Eitan Seri-Levi Feb 17 '17 at 00:22
0
I had the same question(I wanted to pass a json file to the spiders to implement a simple distributed crawl system.
And I simply solved it by converting the json file to a string as a argument in scrapyd.

heamon7
- 1