I currently use selenium to log into a web page and get a cookie I need to access the site, which I then use to authenticate a bunch of JSON RPC requests (also with pycurl). The following code (and later pycurl JSON RPC requests) work perfectly:
driver = webdriver.PhantomJS()
driver.get(my_url)
driver.find_element_by_name('u').send_keys(username)
driver.find_element_by_name('p').send_keys(password)
button = driver.find_element_by_tag_name('button')
button.click()
driver.get_cookies()[0]
However, I am trying to remove external dependencies, particularly on a webdriver (PhantomJS in my case) and use pycurl for the job. I have tried the following:
from io import BytesIO
buffer = BytesIO()
c = pycurl.Curl()
c.setopt(c.URL, my_url)
c.setopt(c.TIMEOUT, 5)
c.setopt(c.HEADER, 1)
c.setopt(c.FOLLOWLOCATION, True) # Follow redirects
c.setopt(c.AUTOREFERER, True)
c.setopt(c.POSTREDIR, pycurl.REDIR_POST_ALL) # Follow redirects after post ...
c.setopt(c.POSTFIELDS, 'u='+ username + '&p=' + password + '&submit=Login')
c.setopt(c.COOKIEJAR, 'xcomfort.cookie')
c.setopt(c.VERBOSE, True)
c.setopt(c.WRITEFUNCTION, buffer.write)
c.perform()
c.close()
However, the verbose output from pycurl is:
* Trying 83.201.233.76...
* Connected to somehost.dyndns.org (83.201.233.76) port 8080 (#0)
> POST /bcgui/index.html HTTP/1.1 Host: somehost.dyndns.org:8080 User-Agent: PycURL/7.43.0 libcurl/7.47.0 OpenSSL/1.0.2e zlib/1.2.8 c-ares/1.10.0 libssh2/1.6.0 Accept: */* Content-Length: 36 Content-Type: application/x-www-form-urlencoded
* upload completely sent off: 36 out of 36 bytes < HTTP/1.1 401 Unauthorized < content-type: text/html; charset=UTF-8 < transfer-encoding: chunked < cache-control: no-cache, no-store, must-revalidate, max-age=0 < date: Thu, 21 Apr 2016 08:19:49 GMT < pragma: no-cache < www-authenticate: None
* Added cookie JSESSIONID="ID3407DB686398770End" for domain somehost.dyndns.org, path /, expire 0 < set-cookie: JSESSIONID=ID3407DB686398770End; Path=/; HttpOnly <
* Connection #0 to host somehost.dyndns.org left intact
As you can see, I get a 401 error here.
The page I am trying to log onto has the following log in form:
<form method="post" action="/system/http/login">
<div id="login_dialog" class="ui-dialog ui-widget ui-widget-content ui-corner-all">
<table>
<tr>
<td><span class="ui-widget-header-1 ui-helper-clearfix ui-dialog-title">Smart Home Controller</span></td>
<td><img src="/system/http/img/eaton_logo.jpg" /></td>
</tr>
</table>
<div class="ui-dialog-titlebar ui-widget-header ui-corner-all ui-helper-clearfix" >
<span class="ui-dialog-title">Please login</span>
</div>
<div id="editor" class="ui-dialog-content" >
<div id="error_message" class="ui-state-error ui-helper-hidden"></div>
<div id="remaining_time" class="ui-state-error ui-helper-hidden">User is locked out for <span>0</span>.</div>
<table>
<tr>
<td class="r">Username:</td>
<td><input name="u"/></td>
</tr>
<tr>
<td class="r">Password:</td>
<td><input type="password" autocomplete="off" name="p"/></td>
</tr>
<tr>
<td class="r"> </td>
<td>
<input type="checkbox" name="r"/> Remember me
<input type="hidden" name="referer" value="/" />
</td>
</tr>
</table>
</div>
<div class="ui-dialog-buttonpane ui-widget-content ui-helper-clearfix">
<button type="submit">Login</button>
</div>
</div>
</form>
I'm completely stumped here. Selenium works beautifully, but pycurl keeps giving me 401. Since I'm expecting someone will tell me to use requests for this, I also did:
import requests
headers = {'User-Agent': 'Mozilla/5.0'}
data = {'u': username, 'p': password }
session = requests.Session()
session.get(my_url)
response = session.post(my_url, json=data, headers=headers, )
print (response)
print(requests.utils.dict_from_cookiejar(session.cookies))
However, this yields:
<Response [401]>
{'JSESSIONID': 'ID3410DB1909202780End'}
Which is basically the same problem (the cookie contains the session ID, but it's not authenticated and cannot be used for later requests).
Any pointers on where I might go wrong here? I prefer the pycurl approach since I use that for JSON RPCs in the rest of the code, but I'm certainly open to any ideas.
UPDATE: Strangely, it seems that if I use standard authentication and just ignore the form on the page, it works. I have no idea why, as I don't get a login prompt from the browser. Only a web page to fill in username/password. Still, it works. The following code gives me a session cookie that is authorized:
headers = {'User-Agent': 'Mozilla/5.0'}
session = requests.Session()
session.get(url)
response = session.post(url, headers=headers, auth=(username, password))
session_id = requests.utils.dict_from_cookiejar(session.cookies)['JSESSIONID']
return session_id