0

I'm running Windows 10 on my pc laptop. There seems to be endless posts related to this in which people are advised to use wget, selenium, HTtracker and on and on. I know, definitively, that all I want to do is write a script that looks at the current web page that I specify and do a Ctrl+S and output the html file to my documents or a destination I specify.

>>> br = webdriver.Chrome()
>>> import selenium
>>> from selenium.webdriver.common.action_chains import ActionChains
>>> from selenium.webdriver.common.keys import Keys
>>> from selenium.webdriver.common.keys import Keys
>>> br = webdriver.Chrome()
>>> br.get(r"http://www.somewebpage.com")
>>> save_me = ActionChains(br).key_down(Keys.CONTROL).key_down('s').key_up('s')
>>> save_me.perform()

And then what? Where does it go?

I also tried this:
>>> import wget
>>> dir = r"C:\Users\user\Documents\GIS DataBase"
>>> url = br.current_url
>>> wget = "wget -p -k -p {} {}".format(dir, url)
>>> os.system(wget)
1

It returned a 1. What does this mean? Where is my saved html file? I can't find anything anywhere.

Lastly, I tried running HTTracker. It gave me all the .js and giffs but none of my search results.

If I have the web page open, I can manually hit Ctrl+S, at which point I am prompted to save the .html file at a destination of my choosing. I can then open this with a text editor and all the information that I need for geocoding is there.

Fabrizio
  • 7,603
  • 6
  • 44
  • 104
geoJshaun
  • 637
  • 2
  • 11
  • 32
  • why not use br.page_source? However it will only download the HTML (excludes JS etc). I believe it returns a string which you can then write to file. – ChickenFeet Jun 20 '17 at 01:41
  • @ChickenFeet Works like a charm. Didn't think it would because viewing page source manually only revealed the html from the log on screen. This is exactly what I need. I take back my "definitively" comment. Way better approach. Post the answer and I'll check it. Thanks! – geoJshaun Jun 20 '17 at 19:33
  • Glad to help. See answer for further information, regarding loading the page before running `page_source` and supporting unicode. – ChickenFeet Jun 21 '17 at 04:09

1 Answers1

0

I think WebDriver.page_source is what you're after. See documentation here.

This method should be executed after the page has loaded, so you may have to perform a 'wait until element loaded' function to help get the entire page. See wait for element Q&A.

Solution example:

// optionally wait for page to finish loading, then
page_src = br.page_source.encode("utf-8") // support unicode characters
f = open('page.html', 'w')
f.write(page_src)
ChickenFeet
  • 2,653
  • 22
  • 26