0

I am trying to use urllib to access a website and then strip the page source so I can collect some data from it. I know how to do this for public websites but I don't know how to use urllib to do this for password protected webpages. I know the username and password, I am just very confused about how to get urllib to put in the correct credentials then reroute me to the correct page that I want to strip the data from. Currently, my code looks like this. The problem is that it is bringing up the login page's source.

from tkinter import *
import csv
from re import findall
import urllib.request
    def info():    

        file = filedialog.askopenfilename()
        fileR = open(file, 'r')
        hold = csv.reader(fileR, delimiter=',', quotechar='|')
        aList=[]
        for item in hold:
            if item[1] and item[2] == "":
                print(item[1])
                url = "www.example.com/id=" + item[1]
                request = urllib.request.urlopen(url)
                html = request.read()
                data = str(html)
                person = findall('''\$MainContent\$txtRecipient\"\stype=\"text\"\svalue=\"([^\"]+)\"''',data)
            else:
                pass

        fileR.close

Remember, I am using python 3.3.3. Any help would be appreciated!

Thomas
  • 21
  • 1
  • 4
  • this will probably help you: http://stackoverflow.com/questions/13925983/login-to-website-using-urllib2-python-2-7 – Lesmana Feb 12 '15 at 15:41
  • I know you have asked how to do this using urllib, But it might be worth looking into using python requests library. Very helpful for webscraping http://docs.python-requests.org/en/latest/user/authentication/ – user3636636 Jun 18 '15 at 04:53

0 Answers0