2

Preface: I understand that there are many responses for similar questions such as this on stack overflow. However, I haven't found anything relating to aspx log ins, nor an exact case such as this.

Problem: I need to determine what information is necessary in order to log in to https://cableone.net/login.aspx in order to scrape information from there.

Progress: Thus far I have found input fields in the source of login.aspx and have scrapped together a script in python with urllib,urllib2,and cookielib. I ignored anythig that had a blank value in my script.

<input type="hidden" name="__EVENTTARGET" id="__EVENTTARGET" value="" />
<input type="hidden" name="__EVENTARGUMENT" id="__EVENTARGUMENT" value="" />
<input type="hidden" name="__VIEWSTATE" id="__VIEWSTATE"value="/wEPDwUIMzc1NzEwOTZkZFAEfkjXC+VNsqYoayGxa5/q4srT" />
<input type="hidden" name="__EVENTVALIDATION" id="__EVENTVALIDATION" value="/wEWBAK6lKDUCwLVx7ufCQL/+N3OBwLFgNGYD6KeUd6uNDBwc5zcR0u4hqrwv1fM" />
<input name="ctl00$plhMain$txtUserName" type="text" id="ctl00_plhMain_txtUserName" />
<input name="ctl00$plhMain$txtPassword" type="password" id="ctl00_plhMain_txtPassword" />
<input type="submit" name="ctl00$plhMain$btnLogin" value="Login" id="ctl00_plhMain_btnLogin" />

I then utilized the above input values with python and urllib in the following.

import urllib, urllib2, cookielib
from cookielib import CookieJar


url = 'https://myaccount.cableone.net/Login.aspx'

cj = CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
cookies = cookielib.CookieJar()

#determine what I need to change with these values 
formValues = {
    "__VIEWSTATE":"/wEPDwUIMzc1NzEwOTZkZFAEfkjXC+VNsqYoayGxa5/q4srT",
    "__EVENTVALIDATION":"/wEWBAK6lKDUCwLVx7ufCQL/+N3OBwLFgNGYD6KeUd6uNDBwc5zcR0u4hqrwv1fM",
    "ctl00$plhMain$txtUserName":"myAccount",
    "ctl00$plhMain$txtPassword":"myPassword"
    }

data = urllib.urlencode(formValues)

response = opener.open("https://myaccount.cableone.net/Login.aspx",data)
thePage = response.read()
httpheaders = response.info()
print thePage 
arete
  • 1,903
  • 4
  • 17
  • 23
  • Look at what data is sent in dev tools of your browser. It doesn't matter if it's aspx or not, or at least shouldn't. As far as you concerned it's just a http service. Make sure that you fake headers as well, some websites check them (user agent, referer, etc). I also suggest to use `requests` module. – gatto Apr 08 '13 at 19:41
  • Thanks for the heads up. I'll look further in to faking the headers. As far as the rest of my input values am I missing anything? – arete Apr 08 '13 at 19:43
  • 1
    Well, for one you should parse the form (use `lxml`) to get the values, because they are most likely dynamic, so hardcoded values won't do. And I would keep everything, even empty hidden inputs, just to be sure. Ok, so the process is `load page - parse and get the form - post form data with cookies and headers`. – gatto Apr 08 '13 at 19:52
  • What body of knowledge should I look up if I want to understand what everything is referring to as far as cookies, http headers and the such. – arete Apr 09 '13 at 18:03

1 Answers1

0

The approach you outlined will be difficult if the form is dynamic in any way. A more universal way is to install Google Chrome Canary which has good developer tools, click "inspect page", then go to "Network" tab, and mark "Preserve log". (You may need the Canary version, because the regular one doesn't catch some of the data if I'm not mistaken)

With all this open, click "login", and you'll see all the requests and headers and POST data. This will give you all the POST data that is sent to the server.

Now, you can test the data in your script, and remove it one by one. Another option for testing the requests is to use Advanced REST Client, by the way.

kolinko
  • 1,633
  • 1
  • 14
  • 31