3

In scrapy 2.0.1 I am writing new data to a json file. At the end of the process I would like to append the scrapy statistics. Now I know that there is a scrapy stats collection available:

https://docs.scrapy.org/en/latest/topics/stats.html

So the right line of code might be this one: stats.get_stats()

In conjunction with:

class ExtensionThatAccessStats(object):

    def __init__(self, stats):
        self.stats = stats

    @classmethod
    def from_crawler(cls, crawler):
        return cls(crawler.stats)

My current pipeline looks like this:

class test_pipeline(object):

    file = None

    def open_spider(self, spider):
        self.file = open('data/test.json', 'wb')
        self.exporter = JsonItemExporter(self.file)
        self.exporter.start_exporting()

    def close_spider(self, spider):
        self.exporter.finish_exporting()
        self.file.close()

I am new to Python. How do I add this functionality to have the stats appended to the json file?

merlin
  • 2,717
  • 3
  • 29
  • 59

1 Answers1

3

You can use a stats collector that runs on the end of the run.

Add it to settings.py:

STATS_CLASS = 'mycrawler.MyStatsCollector.MyStatsCollector'

Here's a basic implementation for MyStatsCollector.py that outputs JSON to a file:

from scrapy.statscollectors import StatsCollector
from scrapy.utils.serialize import ScrapyJSONEncoder

class MyStatsCollector(StatsCollector):
    def _persist_stats(self, stats, spider):
        encoder = ScrapyJSONEncoder()
        with open("stats.json", "w") as file:
            data = encoder.encode(stats)
            file.write(data)
brunobg
  • 786
  • 6
  • 27