I was trying to follow the following thread which seemed to answer my question. It serves as a great example that shows how to download all links on a webpage using Mechanize:
Download all the links(related documents) on a webpage using Python
I followed the code that was posted (i.e.):
import mechanize
from time import sleep
#Make a Browser (think of this as chrome or firefox etc)
br = mechanize.Browser()
#visit http://stockrt.github.com/p/emulating-a-browser-in-python-with-mechanize/
#for more ways to set up your br browser object e.g. so it look like mozilla
#and if you need to fill out forms with passwords.
# Open your site
br.open('http://pypi.python.org/pypi/xlwt')
f=open("source.html","w")
f.write(br.response().read()) #can be helpful for debugging maybe
filetypes=[".zip",".exe",".tar.gz"] #you will need to do some kind of pattern matching on your files
myfiles=[]
for l in br.links(): #you can also iterate through br.forms() to print forms on the page!
for t in filetypes:
if t in str(l): #check if this link has the file extension we want (you may choose to use reg expressions or something)
myfiles.append(l)
def downloadlink(l):
f=open(l.text,"w") #perhaps you should ensure that file doesn't already exist.
br.click_link(l)
f.write(br.response().read())
print l.text," has been downloaded"
#br.back()
for l in myfiles:
sleep(1) #throttle so you dont hammer the site
downloadlink(l)
i only changed:
f=open(l.text,"w") #perhaps you should open in a better way & ensure that file doesn't already exist.
To:
f=open('C:\\l.text',"w") #perhaps you should open in a better way & ensure that file doesn't already exist.
That made the code work for me, else it was giving me an error. When i run the code, i get the following output:
Download> xlwt-0.7.5.tar.gz has been downloaded
xlwt-0.7.5.tar.gz has been downloaded
So it worked. But i have no idea where this file was downloaded to? Any ideas? I have searched my C drive, and could not find it.
If the code is run as:
f=open(l.text,"w")
It raises the following exception:
Traceback (most recent call last):
File "C:\Python27\mech.py", line 33, in <module>
downloadlink(l)
File "C:\Python27\mech.py", line 25, in downloadlink
f=open(l.text,"w") #perhaps you should ensure that file doesn't already exist.
IOError: [Errno 22] invalid mode ('w') or filename: 'Download> <span style="font-size: 75%">xlwt-0.7.5.tar.gz<span>'