Script to download website source to a folder

Question

I am trying to learn simple automation. I have set up an Ubuntu Server and I want to configure it to download html source from a specific URL and append to a file in a specified folder on the server every 1 minute.

The URL is just basic html with no CSS whatsoever.

I want to use python but admittedly can use any language. What is a good, simple day to do this?

Would you like to append the contents to the same file or to a new file each time? — rohithpr, Jun 01 '15 at 17:27

Jeff K · Answer 1 · 2015-06-01T17:45:48.877

0

Just pip install the requests library.

$ pip install requests

Then, it's super easy to get the HTML (put this in a file called get_html.py, or whatever name you like):

import requests

req = requests.get('http://docs.python-requests.org/en/latest/user/quickstart/')

print(req.text)

There are a variety of options for saving the HTML to a directory. For example, you could redirect the output from the above script to a file by calling it like this:

 python get_html.py > file.html

Hope this helps

edited Jun 01 '15 at 17:45

answered Jun 01 '15 at 16:59

Jeff K

96
4

I would recommend using pip3 and python3. A word of caution- when you name a file make sure that you don't name it something similar to an existing module. A simple typo could cause nasty errors. For ex: naming a file "random.py" or "requests.py". "request.py" works, but be careful. – rohithpr Jun 01 '15 at 17:31

score 0 · Answer 2 · answered Jun 01 '15 at 17:39

Jeff's answer works for a one time use. You could do this to run it repeatedly-

import time
import requests

while True:
    with open('filename.extension', 'a') as fp:
        newHtml = requests.get('url').text
        fp.write(newHtml)
    time.sleep(60)

You could run this as a background process for as long as you want.

$ python3 script_name.py &

Script to download website source to a folder

2 Answers2