I'm currently using Scrapy with the following command line arguments:
scrapy crawl my_spider -o data.json
However, I'd prefer to 'save' this command in a Python script. Following https://doc.scrapy.org/en/latest/topics/practices.html, I have the following script:
import scrapy
from scrapy.crawler import CrawlerProcess
from apkmirror_scraper.spiders.sitemap_spider import ApkmirrorSitemapSpider
process = CrawlerProcess({
'USER_AGENT': 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)'
})
process.crawl(ApkmirrorSitemapSpider)
process.start() # the script will block here until the crawling is finished
However, it is unclear to me from the documentation what the equivalent of the -o data.json
command line argument should be within the script. How can I make the script generate a JSON file?