0

How can we save the webpage including the content in it, so that it is viewable offline, using wget in python language? Currently I am using the following code:

import wget

driver.webdriver.Chrome()
driver.get("http://www.yahoo.com")
wget.download("http://www.yahoo.com", C:\\Users\\karanjuneja\\Downloads\\kj\\yahoo.mhtml")

This works and strores an mhtml version of the webpage in the folder, but when you open the file, you will only find the codes written and not the page how it appears online. Any suggestions? Thanks Karan

karan juneja
  • 387
  • 2
  • 4
  • 10
  • driver commands did not work for me but when I commented them out, the third command worked fine. Just needed a starting quote around the output path. For when they are needed, is there any missing code in your example? The error I got on driver commands was that driver was not recognized. Using wget for first time so any help is appreciated. – TMWP Mar 25 '17 at 13:52

1 Answers1

0

This code will help you to create a offline copy of a site that you can take and view even without internet access.

wget --mirror --convert-links --adjust-extension --page-requisites 
--no-parent http://example.org

--mirror – Makes (among other things) the download recursive.

--convert-links – convert all the links (also to stuff like CSS stylesheets) to relative, so it will be suitable for offline viewing.

--adjust-extension – Adds suitable extensions to filenames (html or css) depending on their content-type.

--page-requisites – Download things like CSS style-sheets and images required to properly display the page offline.

--no-parent – When recursing do not ascend to the parent directory. It useful for restricting the download to only a portion of the site.

Thanks to Guy Rutenberg for providing the code in his forum which helped me too.

Karthik Venkatraman
  • 1,619
  • 4
  • 25
  • 55
  • Hi Karthik,Thanks for the reply. I am not able to understand where to execute the code you provided? can i embed this code in the above python code? – karan juneja Mar 23 '17 at 04:19
  • Yes. You can embed it in your code. alternatively you can also use the below code. instead of wget -r you can put the one i gave you above import os path = raw_input("enter the url:") os.system('wget -r -nd -l1 --no-parent -A mp3 %s'%path) – Karthik Venkatraman Mar 23 '17 at 05:29