I write some Scrapy spider. It exported data to file which name I passed via command line: E:\Anaconda3\envs\Blog2Doc\Lib\site-packages\scrapy\cmdline.py runspider blog2doc_scrapy\spiders\blog_spider.py -o ..\data\out.html
. If this file already exists this spider just append content to the existed file. How to check whether output file already exists and if it exists - delete it. For exporting to file I write Blog2DocExporter(BaseItemExporter) class. It is not opened output file, in constructor it gets already opened file object. So In this exporter class I can't check whether exported file already exists.
Asked
Active
Viewed 393 times
0

osya
- 91
- 2
- 10
1 Answers
0
Scrapy overwriting the output files is a known open issue. See for example:
- output as xml appending to existing file when spider re-executed resulting in invalid xml
- Add a command-line option for overwriting exported file
I have myself proposed a fix to rename files with incrementing suffixes. But the implementation is not backward compatible. You may find this useful nonetheless: https://github.com/scrapy/scrapy/pull/2093
It changes the FileFeedStorage
, but you could implement something similar and look at this other answer to use such custom feed storage class.

Community
- 1
- 1

paul trmbrth
- 20,518
- 4
- 53
- 66