Python Ctrl + S for current open url

Question

I'm running Windows 10 on my pc laptop. There seems to be endless posts related to this in which people are advised to use wget, selenium, HTtracker and on and on. I know, definitively, that all I want to do is write a script that looks at the current web page that I specify and do a Ctrl+S and output the html file to my documents or a destination I specify.

>>> br = webdriver.Chrome()
>>> import selenium
>>> from selenium.webdriver.common.action_chains import ActionChains
>>> from selenium.webdriver.common.keys import Keys
>>> from selenium.webdriver.common.keys import Keys
>>> br = webdriver.Chrome()
>>> br.get(r"http://www.somewebpage.com")
>>> save_me = ActionChains(br).key_down(Keys.CONTROL).key_down('s').key_up('s')
>>> save_me.perform()

And then what? Where does it go?

I also tried this:
>>> import wget
>>> dir = r"C:\Users\user\Documents\GIS DataBase"
>>> url = br.current_url
>>> wget = "wget -p -k -p {} {}".format(dir, url)
>>> os.system(wget)
1

It returned a 1. What does this mean? Where is my saved html file? I can't find anything anywhere.

Lastly, I tried running HTTracker. It gave me all the .js and giffs but none of my search results.

If I have the web page open, I can manually hit Ctrl+S, at which point I am prompted to save the .html file at a destination of my choosing. I can then open this with a text editor and all the information that I need for geocoding is there.

why not use br.page_source? However it will only download the HTML (excludes JS etc). I believe it returns a string which you can then write to file. — ChickenFeet, Jun 20 '17 at 01:41
@ChickenFeet Works like a charm. Didn't think it would because viewing page source manually only revealed the html from the log on screen. This is exactly what I need. I take back my "definitively" comment. Way better approach. Post the answer and I'll check it. Thanks! — geoJshaun, Jun 20 '17 at 19:33
Glad to help. See answer for further information, regarding loading the page before running `page_source` and supporting unicode. — ChickenFeet, Jun 21 '17 at 04:09

score 0 · Accepted Answer · answered Jun 21 '17 at 04:05

I think WebDriver.page_source is what you're after. See documentation here.

This method should be executed after the page has loaded, so you may have to perform a 'wait until element loaded' function to help get the entire page. See wait for element Q&A.

Solution example:

// optionally wait for page to finish loading, then
page_src = br.page_source.encode("utf-8") // support unicode characters
f = open('page.html', 'w')
f.write(page_src)

Python Ctrl + S for current open url

1 Answers1