In Scrapy how to check whether exported file already exists?

Question

I write some Scrapy spider. It exported data to file which name I passed via command line: E:\Anaconda3\envs\Blog2Doc\Lib\site-packages\scrapy\cmdline.py runspider blog2doc_scrapy\spiders\blog_spider.py -o ..\data\out.html. If this file already exists this spider just append content to the existed file. How to check whether output file already exists and if it exists - delete it. For exporting to file I write Blog2DocExporter(BaseItemExporter) class. It is not opened output file, in constructor it gets already opened file object. So In this exporter class I can't check whether exported file already exists.

score 0 · Answer 1 · edited May 23 '17 at 12:17

Scrapy overwriting the output files is a known open issue. See for example:

I have myself proposed a fix to rename files with incrementing suffixes. But the implementation is not backward compatible. You may find this useful nonetheless: https://github.com/scrapy/scrapy/pull/2093

It changes the FileFeedStorage, but you could implement something similar and look at this other answer to use such custom feed storage class.

In Scrapy how to check whether exported file already exists?

1 Answers1