1

My problem is as follows:

I have 3 item pipelines

  • one FilesPipeline that download archives
  • one ArchiveUnpackerPipeline that unpacks an archive
  • one SymbolicLinkerPipeline that generates symbolic links to the contents of those archives

The issue is the following:

Due to the way the website is built, I may have to generate symbolic links to the same archive from different items. If everything ran in a sequential manner it may look like this:

  1. item_1 initiates download of archive_1
  2. item_1 initiates unpacking of archive_1
  3. item_1 initiates symbolic linking of files_1 from archive_1
  4. item_2 sees that archive_1 was downloaded in the past, returns item_1
  5. item_2 sees that archive_1 was unpacked in the past, returns item_1
  6. item_2 initiates symbolic linking of files_2 from archive_1

But as the download may take a while it can happen that the same file is downloaded twice, and so on, which leads to errors.

Is there an elegant way to tackle this problem? My first guess is that it may work with a global dictionary that keeps track of the status of each download_url with states like downloading, finished_downloading, unpacking, finished_unpacking and using twisted Deferred -- but as I have never worked with twisted before I am not sure entirely.

Julien Marrec
  • 11,605
  • 4
  • 46
  • 63
ZeeD26
  • 11
  • 3

0 Answers0