My problem is as follows:
I have 3 item pipelines
- one FilesPipeline that download archives
- one ArchiveUnpackerPipeline that unpacks an archive
- one SymbolicLinkerPipeline that generates symbolic links to the contents of those archives
The issue is the following:
Due to the way the website is built, I may have to generate symbolic links to the same archive from different items. If everything ran in a sequential manner it may look like this:
item_1
initiates download ofarchive_1
item_1
initiates unpacking ofarchive_1
item_1
initiates symbolic linking offiles_1
fromarchive_1
item_2
sees thatarchive_1
was downloaded in the past, returnsitem_1
item_2
sees thatarchive_1
was unpacked in the past, returnsitem_1
item_2
initiates symbolic linking offiles_2
fromarchive_1
But as the download may take a while it can happen that the same file is downloaded twice, and so on, which leads to errors.
Is there an elegant way to tackle this problem? My first guess is that it may work with a global dictionary that keeps track of the status of each download_url with states like downloading
, finished_downloading
, unpacking
, finished_unpacking
and using twisted Deferred -- but as I have never worked with twisted before I am not sure entirely.