3

Set-up

I export my data to a .csv file by the standard command in Terminal (Mac OS), e.g.

scrapy crawl spider -o spider_ouput.csv 

Problem

When exporting a new spider_output.csv Scrapy appends it to the existing spider_output.csv.

I can think of two solutions,

  1. Command Scrapy to overwrite instead of append
  2. Command Terminal to remove the existing spider_output.csv prior to crawling

I've read that (to my surprise) Scrapy currently isn't able to do 1. Some people have proposed workarounds, but I can't seem to get it to work.

I've found an answer to solution 2, but can't get it to work either.

Can somebody help me? Perhaps there is a third solution I haven't thought of?

Community
  • 1
  • 1
LucSpan
  • 1,831
  • 6
  • 31
  • 66

3 Answers3

12

There is an open issue with scrapy for this feature: https://github.com/scrapy/scrapy/issues/547

There are some solutions proposed in the issue thread:

scrapy runspider spider.py -t json --nolog -o - > out.json

Or just delete output before running scrapy spider:

rm data.jl; scrapy crawl myspider -o data.jl
Granitosaurus
  • 20,530
  • 5
  • 57
  • 82
1

Use big O:

scrapy crawl spider -O spider_ouput.csv 
Suraj Rao
  • 29,388
  • 11
  • 94
  • 103
  • 2
    Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Apr 21 '22 at 06:52
0

option -t defines the file format like json, csv, ...

option -o FILE dump scraped items into FILE (use - for stdout)

>filename pipes output to filename

altogether we get for overwriting previous export file:

replace output file instead of appending:

scrapy crawl spider -t csv -o - >spider.csv

or for json format:

scrapy crawl spider -t json -o - >spider.json

Katja Süss
  • 759
  • 6
  • 13