14

I want to download a webpage using selenium with python. using the following code:

from selenium import webdriver
from selenium.webdriver.common.action_chains import ActionChains
from selenium.webdriver.common.keys import Keys

chromeOptions = webdriver.ChromeOptions()
chromeOptions.add_argument('--save-page-as-mhtml')
d = DesiredCapabilities.CHROME
driver = webdriver.Chrome()

driver.get("http://www.yahoo.com")

saveas = ActionChains(driver).key_down(Keys.CONTROL)\
         .key_down('s').key_up(Keys.CONTROL).key_up('s')
saveas.perform()
print("done")

However the above code isnt working. I am using windows 7. Is there any by which i can bring up the 'Save as" Dialog box?

Thanks Karan

karan juneja
  • 387
  • 2
  • 4
  • 10

1 Answers1

29

You can use below code to download page HTML:

from selenium import webdriver
  
driver = webdriver.Chrome()
driver.get("http://www.yahoo.com")
with open("/path/to/page_source.html", "w", encoding='utf-8') as f:
    f.write(driver.page_source)

Just replace "/path/to/page_source.html" with desirable path to file and file name

Update

If you need to get complete page source (including CSS, JS, ...), you can use following solution:

pip install pyahk # from command line

Python code:

from selenium.webdriver.firefox.firefox_binary import FirefoxBinary
import ahk

firefox = FirefoxBinary("C:\\Program Files (x86)\\Mozilla Firefox\\firefox.exe")
from selenium import webdriver

driver = web.Firefox(firefox_binary=firefox)
driver.get("http://www.yahoo.com")
ahk.start()
ahk.ready()
ahk.execute("Send,^s")
ahk.execute("WinWaitActive, Save As,,2")
ahk.execute("WinActivate, Save As")
ahk.execute("Send, C:\\path\\to\\file.htm")
ahk.execute("Send, {Enter}")
Ms.kitty
  • 17
  • 6
Andersson
  • 51,635
  • 17
  • 77
  • 129
  • Buddy, thanks for the prompt reply. But I am getting the following error: Traceback (most recent call last): File "C:\Users\karanjuneja\Desktop\Eclipse Workspace\Library\test1.py", line 35, in f.write(driver.page_source) File "C:\Users\karanjuneja\AppData\Local\Programs\Python\Python35-32\lib\encodings\cp1252.py", line 19, in encode return codecs.charmap_encode(input,self.errors,encoding_table)[0] UnicodeEncodeError: 'charmap' codec can't encode characters in position 106288-106293: character maps to – karan juneja Mar 20 '17 at 10:08
  • I want to save the file in .mhtml format. I used the following code: from selenium import webdriver driver = webdriver.Chrome() driver.get("http://www.yahoo.com") with open("/path/to/page_source.html", "w", encoding="utf-8") as f: f.write(driver.page_source) It saved the page but the page just had source code. Cannot view the original content on the page. Any suggestions? – karan juneja Mar 20 '17 at 10:28
  • You mean you want browser to open your local (downloaded) page copy just like you get it from server directly? – Andersson Mar 20 '17 at 10:32
  • Yes buddy, just like the webpage appears when we open it. – karan juneja Mar 20 '17 at 10:33
  • :Thanks so much for the answer. Can you please provide the same solution for ChromeDriver as I am using Chrome browser. Would be grateful. :) – karan juneja Mar 20 '17 at 13:41
  • Just replace `driver = web.Firefox(firefox_binary=firefox)` with `driver = web.Chrome()` – Andersson Mar 20 '17 at 13:45
  • even pyahk didnt work as I have python 3.6 which i think does not support pyahk. Instead I was able to do it with pyautogui, however on clicking CTRL + S to open save as dialog box, sometimes the page is opening the Alt + Space functions. I dont get it why that is happening as the command entered is CTRL "s" only. – karan juneja Mar 20 '17 at 17:05
  • Hm... I use `Python 3.5` + `Win7`. My code works well with `Chrome`. Did you get any exception? – Andersson Mar 20 '17 at 17:07
  • I dont know, i could not even install pyahk using "pip install pyahk" – karan juneja Mar 20 '17 at 17:10
  • pyahk only for py2. Looks like the source code is gone from bitbucket although there is a fix https://stackoverflow.com/a/44767894/4549682 – wordsforthewise Feb 05 '21 at 16:23
  • What if the website has anti-scrape blockers that block Selenium and BSoup? – Fandango68 Jun 22 '23 at 00:54