0

I currently use selenium to log into a web page and get a cookie I need to access the site, which I then use to authenticate a bunch of JSON RPC requests (also with pycurl). The following code (and later pycurl JSON RPC requests) work perfectly:

driver = webdriver.PhantomJS()
driver.get(my_url)
driver.find_element_by_name('u').send_keys(username)
driver.find_element_by_name('p').send_keys(password)
button = driver.find_element_by_tag_name('button')
button.click()
driver.get_cookies()[0]

However, I am trying to remove external dependencies, particularly on a webdriver (PhantomJS in my case) and use pycurl for the job. I have tried the following:

from io import BytesIO
buffer = BytesIO()
c = pycurl.Curl()
c.setopt(c.URL, my_url)
c.setopt(c.TIMEOUT, 5)
c.setopt(c.HEADER, 1)
c.setopt(c.FOLLOWLOCATION, True)  # Follow redirects
c.setopt(c.AUTOREFERER, True)
c.setopt(c.POSTREDIR, pycurl.REDIR_POST_ALL)  # Follow redirects after post ...
c.setopt(c.POSTFIELDS, 'u='+ username + '&p=' + password + '&submit=Login')
c.setopt(c.COOKIEJAR, 'xcomfort.cookie')
c.setopt(c.VERBOSE, True)
c.setopt(c.WRITEFUNCTION, buffer.write)

c.perform()
c.close()

However, the verbose output from pycurl is:

*   Trying 83.201.233.76...
* Connected to somehost.dyndns.org (83.201.233.76) port 8080 (#0)
> POST /bcgui/index.html HTTP/1.1 Host: somehost.dyndns.org:8080 User-Agent: PycURL/7.43.0 libcurl/7.47.0 OpenSSL/1.0.2e zlib/1.2.8 c-ares/1.10.0 libssh2/1.6.0 Accept: */* Content-Length: 36 Content-Type: application/x-www-form-urlencoded

* upload completely sent off: 36 out of 36 bytes < HTTP/1.1 401 Unauthorized < content-type: text/html; charset=UTF-8 < transfer-encoding: chunked < cache-control: no-cache, no-store, must-revalidate, max-age=0 < date: Thu, 21 Apr 2016 08:19:49 GMT < pragma: no-cache < www-authenticate: None
* Added cookie JSESSIONID="ID3407DB686398770End" for domain somehost.dyndns.org, path /, expire 0 < set-cookie: JSESSIONID=ID3407DB686398770End; Path=/; HttpOnly < 
* Connection #0 to host somehost.dyndns.org left intact

As you can see, I get a 401 error here.

The page I am trying to log onto has the following log in form:

<form method="post" action="/system/http/login">
    <div id="login_dialog" class="ui-dialog ui-widget ui-widget-content ui-corner-all">

        <table>
            <tr>
                <td><span class="ui-widget-header-1 ui-helper-clearfix ui-dialog-title">Smart Home Controller</span></td>
                <td><img src="/system/http/img/eaton_logo.jpg" /></td>
            </tr>
        </table>

        <div class="ui-dialog-titlebar ui-widget-header ui-corner-all ui-helper-clearfix" >
            <span class="ui-dialog-title">Please login</span>
        </div>

        <div id="editor" class="ui-dialog-content" >
            <div id="error_message" class="ui-state-error ui-helper-hidden"></div>
            <div id="remaining_time" class="ui-state-error ui-helper-hidden">User is locked out for <span>0</span>.</div>
            <table>
                <tr>
                    <td class="r">Username:</td>
                    <td><input name="u"/></td>
                </tr>
                <tr>
                    <td class="r">Password:</td>
                    <td><input type="password" autocomplete="off" name="p"/></td>
                </tr>
                <tr>
                    <td class="r">&nbsp;</td>
                    <td>
                        <input type="checkbox" name="r"/> Remember me
                        <input type="hidden" name="referer" value="/" />
                    </td>
                </tr>
            </table>
        </div>

        <div class="ui-dialog-buttonpane ui-widget-content ui-helper-clearfix">
            <button type="submit">Login</button>
        </div>
    </div>
    </form>

I'm completely stumped here. Selenium works beautifully, but pycurl keeps giving me 401. Since I'm expecting someone will tell me to use requests for this, I also did:

import requests
headers = {'User-Agent': 'Mozilla/5.0'}
data = {'u': username, 'p': password }

session = requests.Session()
session.get(my_url)

response = session.post(my_url, json=data, headers=headers, )
print (response)
print(requests.utils.dict_from_cookiejar(session.cookies))

However, this yields:

<Response [401]>
{'JSESSIONID': 'ID3410DB1909202780End'}

Which is basically the same problem (the cookie contains the session ID, but it's not authenticated and cannot be used for later requests).

Any pointers on where I might go wrong here? I prefer the pycurl approach since I use that for JSON RPCs in the rest of the code, but I'm certainly open to any ideas.

UPDATE: Strangely, it seems that if I use standard authentication and just ignore the form on the page, it works. I have no idea why, as I don't get a login prompt from the browser. Only a web page to fill in username/password. Still, it works. The following code gives me a session cookie that is authorized:

headers = {'User-Agent': 'Mozilla/5.0'}

session = requests.Session()
session.get(url)

response = session.post(url, headers=headers, auth=(username, password))

session_id = requests.utils.dict_from_cookiejar(session.cookies)['JSESSIONID']
return session_id
olesk
  • 3
  • 5
  • How do you authenticate with the server? Are you sure you put your username and password in the request body or does it use something like [http basic authentication.](https://en.wikipedia.org/wiki/Basic_access_authentication) – Greg Apr 21 '16 at 09:00
  • No, as you can see from the selenium example that I'm trying to get away from (but which works), I fill the username and password field and click the Submit button. This works perfectly, and is also how I manually log in to the site. – olesk Apr 21 '16 at 09:28
  • 1
    It's good to note that when debugging web requests you want to make with cURL, you should use the web inspector in your browser, click on the network tab, and inspect the HTTP requests. That way you can mimic them pretty much exactly. – RattleyCooper Apr 21 '16 at 15:31
  • like I mentioned look at the web page. Determine the authentication method through your investigation. Does it use OAuth2? Does it use http basic authentication? It's totally site dependent. As @DuckPuncher mentioned you can perform debugging with the chrome web debugger or firebug if you use firefox. – Greg Apr 22 '16 at 06:34
  • You're right Greg, and I noticed only after mucking about with Chrome debug tool that the page I was trying to log in through was actually the 401 page. It wasn't until I saw I got a 401 when loading the page that I realized that I could use basic authentication to skip the page completely. I should have listened to your first comment. – olesk Apr 26 '16 at 14:10

0 Answers0