0

I write some Scrapy spider. It exported data to file which name I passed via command line: E:\Anaconda3\envs\Blog2Doc\Lib\site-packages\scrapy\cmdline.py runspider blog2doc_scrapy\spiders\blog_spider.py -o ..\data\out.html. If this file already exists this spider just append content to the existed file. How to check whether output file already exists and if it exists - delete it. For exporting to file I write Blog2DocExporter(BaseItemExporter) class. It is not opened output file, in constructor it gets already opened file object. So In this exporter class I can't check whether exported file already exists.

osya
  • 91
  • 2
  • 10

1 Answers1

0

Scrapy overwriting the output files is a known open issue. See for example:

I have myself proposed a fix to rename files with incrementing suffixes. But the implementation is not backward compatible. You may find this useful nonetheless: https://github.com/scrapy/scrapy/pull/2093

It changes the FileFeedStorage, but you could implement something similar and look at this other answer to use such custom feed storage class.

Community
  • 1
  • 1
paul trmbrth
  • 20,518
  • 4
  • 53
  • 66