11

I'm currently using Scrapy with the following command line arguments:

scrapy crawl my_spider -o data.json

However, I'd prefer to 'save' this command in a Python script. Following https://doc.scrapy.org/en/latest/topics/practices.html, I have the following script:

import scrapy
from scrapy.crawler import CrawlerProcess

from apkmirror_scraper.spiders.sitemap_spider import ApkmirrorSitemapSpider

process = CrawlerProcess({
    'USER_AGENT': 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)'
})

process.crawl(ApkmirrorSitemapSpider)
process.start() # the script will block here until the crawling is finished

However, it is unclear to me from the documentation what the equivalent of the -o data.json command line argument should be within the script. How can I make the script generate a JSON file?

Kurt Peek
  • 52,165
  • 91
  • 301
  • 526
  • 3
    Possible duplicate of [scrapy from script output in json](http://stackoverflow.com/questions/23574636/scrapy-from-script-output-in-json) – Casper Apr 18 '17 at 09:49
  • 1
    Do refer this [answer](http://stackoverflow.com/questions/23574636/scrapy-from-script-output-in-json) – Jaysheel Utekar Apr 18 '17 at 09:50
  • Possible duplicate of [scrapy from script output in json](https://stackoverflow.com/questions/23574636/scrapy-from-script-output-in-json) – Gallaecio Jul 16 '19 at 16:27

1 Answers1

14

You need to add the FEED_FORMAT and FEED_URI to your CrawlerProcess:

process = CrawlerProcess({
'USER_AGENT': 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)',
'FEED_FORMAT': 'json',
'FEED_URI': 'data.json'
})
vold
  • 1,549
  • 1
  • 13
  • 19