unable to dowload pdf using python

Question

I am trying to download a pdf using python script. I had tried using urlib, pdfkit and also curl. While I am trying to download the pdf, I am getting html/js content of the page instead of the pdf file. Kindly help me to solve this issue.

using pdfkit:

import pdfkit
pdfkit.from_url('http://www.kubota.com/product/BSeries/B2301/pdf/B01_Specs.pdf', 'out.pdf', options = {'javascript-delay':'10000'})

using urllib:

import urllib2
response = urllib2.urlopen('http://www.kubota.com/product/BSeries/B2301/pdf/B01_Specs.pdf')
file = open("out.pdf", 'wb')
file.write(response.read())
file.close()

score 2 · Accepted Answer · answered Apr 24 '17 at 23:38

You could use the urllib3 library

import urllib3

def download_file(download_url):
    http = urllib3.PoolManager()
    response = http.request('GET', download_url)
    f = open('output.pdf', 'wb')
    f.write(response.data)
    f.close()

if __name__ == '__main__':
    download_file('http://www.kubota.com/product/BSeries/B2301/pdf/B01_Specs.pdf')

score 0 · Answer 2 · answered Apr 24 '17 at 23:50

0

You should be able to do it with requests pretty easily

import requests

r = requests.get('http://www.axmag.com/download/pdfurl-guide.pdf') #your url here
with open('your_file_path_here.pdf', 'wb') as f:
    f.write(r.content)

answered Apr 24 '17 at 23:50

slearner

622
2
6
13

Actually, just tried it with your link and it looks like there's a captcha/some sort of authentication before you can get to the PDF you're looking for, so that's probably the issue, rather than you code – slearner Apr 24 '17 at 23:56
Thanks for the response. how do I resolve it ?. Do I need to send some information through headers to crack the authentication ? – Vamshi Kolanu Apr 24 '17 at 23:59

unable to dowload pdf using python

2 Answers2