I am using Scrapy and Scrapyd to monitor certain sites. The output files are compressed jsonlines. Right after I submit a job schedule to scrapyd, I can see the output file being created and is growing as it scrapes.
My problem is I can't be sure when the output file is ready, i.e. spider is completed. One way to do it is to rename the output file to something like "output.done" so my other programs can list these files and process them.
My current method is to check the modify time of the file, and if it doesn't change for five minutes then I assume it is done. However, five minute doesn't seem enough sometimes, and I really hope I don't need to extend it to 30min.