2

I need to extract the source code from the website that requires loggin. I can access and open the web page by clicking the link, because I already logged into the page before and there are stored cookie.Click here to view the Manual Process diagram

However, if I tried to use python with the same link

import urllib

link = "http://www.somesite.com/details.pl?urn=2344"
f = urllib.urlopen(link)
myfile = f.read()
print myfile

The result always returns the source code of the login page

Can someone help with me this?Thanks a lot

UPDATE 1.0:

I have tried the stackoverflow.com/a/13955538/968442 provided by itsneo and it works perfectly with my Reddit account. However, after I update the user name, password and the URL (have double checked on these values), I still stuck at the login page that I want to access with.

Following is the login page I tried to access, do I need to add any additional attribute to the code?

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html>
  <head>
    <title></title>
    <link href="/stylesheets/ab.css?1447805380" media="screen" rel="stylesheet" type="text/css" />
    <link href="/stylesheets/login.css?1412579810" media="screen" rel="stylesheet" type="text/css" />
  </head>
  <body>

  <span class="centerblockabsolute login">
  <div class="base-layer">
    <div id="banner">
      active billing
    </div>
    <form action="/login/login" method="post">
    <div id="leftcontent">
    <div>    
          </div>
      <span class="error"></span><br />
      <table>
        <tr>
          <td><label for="username">user name:</td>
          <td><input id="username" name="username" size="20" type="text" /></td>
        </tr>
        <tr>
          <td><label for="passsword">Password: </label></td>
          <td><input id="password" name="password" size="22" type="password" /></td>
        </tr>     
        <tr>
          <td align="right" colspan="2">
            <input name="submit" src="/images/buttons/login.gif?1412579810" type="image" value="submit" />
          </td>
        </tr>   
      </table>  
    </div>
    <div id="rightcontent">
          <img alt="Logo-ab" src="/images/logo-ab.png?1412579810" />
          <input id="redirect_url" name="redirect_url" type="hidden" value="https://eutility.activebilling.com.au/" />
    </div>
</form>

  </div>

  </span>
  </body>
</html>
Ricky Zheng
  • 33
  • 1
  • 7
  • Does this help ? http://stackoverflow.com/a/13955538/968442 – nehem Nov 26 '15 at 23:22
  • 2
    If you logged in using the browser, the cookie is in the browser; your Python has no idea about it. Use [mechanize](http://wwwsearch.sourceforge.net/mechanize/) to easily interact with web pages from Python, including log-in and subsequent use of the cookie. – Amadan Nov 26 '15 at 23:22
  • 1
    Personally I'd suggest using the requests library rather than much lower-level use of urllib - requests has 'session's which will persist cookies across a whatjamacallit, oh yes, session. See http://docs.python-requests.org/ - for more info on sessions http://docs.python-requests.org/en/latest/user/advanced/#session-objects - for more info on authentication http://docs.python-requests.org/en/latest/user/authentication/ – DisappointedByUnaccountableMod Nov 26 '15 at 23:47
  • Can't you open the page in the browser, authenticate, and then save the page? – DisappointedByUnaccountableMod Nov 26 '15 at 23:50
  • Thanks for all the response, @itsneo can you please help me with my new update – Ricky Zheng Nov 27 '15 at 00:36
  • What do you mean by stuck at the login page? Can you give us the response html / status code? And could u post the code you used? – MK Yung Nov 27 '15 at 01:24

0 Answers0