2

I'm attempting to download files from a website that uses a CDN for distribution. The URLs on the download page all end with file.pdf but clicking on the link in a browser results in the download of a file with a descriptive file name (e.g. 'invoice1234.pdf'). Obviously parsing the URL to get the file name results in every file being named file.pdf - I would like to use the same file name that is used when downloading via the browser. My code looks something like this:

  filename = File.basename(download.href)
  agent.pluggable_parser.default = Mechanize::Download
  agent.get(mov_download_link.href).save("#{path}/#{filename}")
  agent.pluggable_parser.default = Mechanize::File

Any ideas would be appreciated!

JP.
  • 5,536
  • 7
  • 58
  • 100

1 Answers1

2

That filename is probably in a header that looks like this:

{'content-disposition' => 'filename="invoice1234.pdf"'}

If so:

f = agent.get(mov_download_link.href)
filename = f.header['content-disposition'][/"(.*)"/, 1]
f.save("#{path}/#{filename}")
pguardiario
  • 53,827
  • 19
  • 119
  • 159
  • Or it could look like `"content-disposition"=>"attachment; filename=invoice1234.pdf"`... in which case a quick `f.header['content-disposition'].split('=')[1]` will do just fine as well. – poweratom Jan 16 '17 at 05:51