How to iterate and download multiple pdfs using selenium and Python

Question

I am a bit new to using selenium and Python.Below is the code that I am trying to run to download multiple files.

from selenium import webdriver
driver = webdriver.Chrome(executable_path=r'C:\chromedriver_win32\chromedriver.exe')
cusip=['abc123','def456','ghi789']
for a in cusip:

    page=driver.get("http://mylink=" + str(a) + ".pdf")
    with open(a + '.pdf', 'wb') as f:
        for chunk in page.iter_content(chunk_size=1024):
            if chunk:
                f.write(chunk)

Error that I receive is as below:

Traceback (most recent call last):
  File "C:/Users/shashi.singh/PycharmProjects/HiSSS/Selenium.py", line 13, in <module>
    for chunk in page.iter_content(chunk_size=1024):
AttributeError: 'NoneType' object has no attribute 'iter_content'

Are you sure `driver.get("http://mylink=" + str(a) + ".pdf")` is returning anything? — Colin Ricardo, Apr 17 '18 at 14:39
Yes Colin, it is returning me the web url that basically has the pdf document for every item in the cusip list. — Shashi Shankar Singh, Apr 17 '18 at 14:45
I can guarantee you that `page` is not an HTML page, it is a `NoneType` — user3483203, Apr 17 '18 at 14:50

score 0 · Answer 1 · answered Apr 17 '18 at 14:41

0

I would not recommend using selenium for this task. If you have a list of urls, simply use urllib.request.urlretrive:

In [5]: from urllib import request

In [6]: request.urlretrieve('https://arxiv.org/pdf/1409.8470.pdf', r'C:\users\chris\test.pdf')
Out[6]: ('C:\\users\\chris\\test.pdf', <http.client.HTTPMessage at 0x59628d0>)

Just pass each url as the first argument, and the destination as the final argument.

answered Apr 17 '18 at 14:41

user3483203

50,081
9
65
94

Thanks Chriz.. Makes much sense if we already have the urls to use. One question though if we have a list like [(a.url,b.pdf),(c.url,d.pdf)]. How do you suggest to fetch the url using first value of list and then save it with a name picking the second value of list? – Shashi Shankar Singh May 31 '18 at 10:09

score 0 · Accepted Answer · answered Apr 18 '18 at 13:13

Thanks for the help everyone..Below is the code that I am using and its working fine.

from selenium import webdriver
driver = webdriver.Chrome(executable_path=r'C:\chromedriver_win32\chromedriver.exe')
cusip=['abc123','def456','ghi789']
options = webdriver.ChromeOptions()

tgt = "C:\\directory"  #target directory to download item
profile = {"plugins.plugins_list": [{"enabled":False, "name":"Chrome PDF Viewer"}],
    "download.default_directory" : tgt}
options.add_experimental_option("prefs",profile)
print(options)
driver = webdriver.Chrome(executable_path=r'C:\chromedriver_win32\chromedriver.exe', chrome_options=options)

for a in cusip:
    page=driver.get("http://mylink=" + str(a) + ".pdf") #iterate the item in cusip list

Print('Process completed Successfully')

The cusip is a list that I have to iterate and add it to the web page I need to download and hence you may modify it needed.

How to iterate and download multiple pdfs using selenium and Python

2 Answers2