http://doc.scrapy.org/en/latest/topics/media-pipeline.html
When the item reaches the FilesPipeline, the URLs in the file_urls field are scheduled for download using the standard Scrapy scheduler and downloader (which means the scheduler and downloader middlewares are reused), but with a higher priority, processing them before other pages are scraped. The item remains “locked” at that particular pipeline stage until the files have finish downloading (or fail for some reason).
I want to do the exact opposite: Scrape all HTML urls first, then, download all media files at once. How can I do that?