0

I'm trying to get the HTML content of a password protected site using Ghost.py.

The web server I have to access, has the following HTML code (I cut it just to the important parts):

URL: http://192.168.1.60/PAGE.htm

<html>
<head>
<script language="JavaScript">
    function DoHash()
    {
      var psw = document.getElementById('psw_id');
      var hpsw = document.getElementById('hpsw_id');
      var nonce = hpsw.value;
      hpsw.value = MD5(nonce.concat(psw.value));
      psw.value = '';
      return true;
    }
    </script>
</head>
<body>
<form action="PAGE.HTM" name="" method="post" onsubmit="DoHash();">
Access code <input id="psw_id" type="password" maxlength="15" size="20" name="q" value="">
<br>
<input type="submit" value="" name="q" class="w_bok">
<br>
<input id="hpsw_id" type="hidden" name="pA" value="180864D635AD2347">
</form>
</body>
</html>

The value of "#hpsw_id" changes every time you load the page.

On a normal browser, once you type the correct password and press enter or click the "submit" button, you land on the same page but now with the real contents.

URL: http://192.168.1.60/PAGE.htm

<html>
<head>
<!–– javascript is gone ––>
</head>
<body>
Welcome to PAGE.htm content
</body>
</html>

First I tried with mechanize but failed, as I need javascript. So now I´m trying to solve it using Ghost.py

My code so far:

import ghost
g = ghost.Ghost()
with g.start(wait_timeout=20) as session:
    page, extra_resources = session.open("http://192.168.1.60/PAGE.htm")
    if page.http_status == 200:
        print("Good!")
        session.evaluate("document.getElementById('psw_id').value='MySecretPassword';")
        session.evaluate("document.getElementsByClassName('w_bok')[0].click();", expect_loading=True)
        print session.content

This code is not loading the contents correctly, in the console I get:

Traceback (most recent call last): File "", line 8, in File "/usr/local/lib/python2.7/dist-packages/ghost/ghost.py", line 181, in wrapper timeout=kwargs.pop('timeout', None)) File "/usr/local/lib/python2.7/dist-packages/ghost/ghost.py", line 1196, in wait_for_page_loaded 'Unable to load requested page', timeout) File "/usr/local/lib/python2.7/dist-packages/ghost/ghost.py", line 1174, in wait_for raise TimeoutError(timeout_message) ghost.ghost.TimeoutError: Unable to load requested page

Two questions...

1) How can I successfully login to the password protected site and get the real content of PAGE.htm?

2) Is this direction the best way to go? Or I'm missing something completely which will make things work more efficiently?

I'm using Ubuntu Mate.

Ñhosko
  • 723
  • 2
  • 8
  • 25

1 Answers1

0

This is not the answer I was looking for, just a work-around to make it work (in case someone else has a similar issue in the future).

To skip the javascript part (which was stopping me to use python's request), I decided to do the expected hash on python (and not on web) and send the hash as the normal web form would do.

So the Javascript basically concatenates the hidden hpsw_id value and the password, and makes a md5 from it.

The python now looks like this:

import requests
from hashlib import md5
from re import search

url = "http://192.168.1.60/PAGE.htm"
with requests.Session() as s:
    # Get hpsw_id number from website
    r = s.get(url)
    hpsw_id = search('name="pA" value="([A-Z0-9]*)"', r.text)
    hpsw_id = hpsw_id.group(1)
    # Make hash of ID and password
    m = md5()
    m.update(hpsw_id + 'MySecretPassword')
    pA = m.hexdigest()
    # Post to website to login
    r = s.post(url, data=[('q', ''), ('q', ''), ('pA', pA)])
    print r.content

Note: the q, q and pA are the elements that the form (q=&q=&pA=f08b97e5e3f472fdde4280a9aa408aaa) is sending when I login normally using internet browser.

If someone however knows the answer of my original question I would be very appreciated if you post it here.

Ñhosko
  • 723
  • 2
  • 8
  • 25