4

Im trying to webscrape some information from my school page, but im having hard time to get past login. I know there are similar threeds, i have spend whole day reading, but cannot make it work.

This is program im using (User name and password were changed):

import requests

payload = {'ctl00$cphmain$Loginname': 'name', 'ctl00$cphmain$TextBoxHeslo': 'password'}

page = requests.post('http://gymnaziumbma.no-ip.org:81/login.aspx', payload)
open_page = requests.get("http://gymnaziumbma.no-ip.org:81/prehled.aspx?s=44&c=prub")

#Check content
if page.text == open_page.text:
    print("Same page")
else:
    print(open_page.text)
    print("Different page!")

Can you tell me, what im doing wrong? Am i missing some parameter? Is requests good metod for this? I was trying robobrowser and BeautifulSoup, but doesnt work either. I bet im missing something really trivial.

Im using Python 3.5

Daniel Guńka
  • 43
  • 1
  • 3

1 Answers1

8

First off, you are not using a Session so even if your first post successfully logs you on the second knows nothing about it. Second, you are missing data that needs to be posted, __VIEWSTATEGENERATOR and __VIEWSTATE which you can parse from the source using BeautifulSoup:

from bs4 import BeautifulSoup

data = {'ctl00$cphmain$Loginname': 'name', 'ctl00$cphmain$TextBoxHeslo': 'password'}
# A Session object will persist the login cookies.
with requests.Session() as s:
    page = s.get('http://gymnaziumbma.no-ip.org:81/login.aspx')
    soup = BeautifulSoup(page.content)
    data["___VIEWSTATE"] = soup.select_one("#__VIEWSTATE")["value"]
    data["__VIEWSTATEGENERATOR"] = soup.select_one("#__VIEWSTATEGENERATOR")["value"]
    s.post('http://gymnaziumbma.no-ip.org:81/login.aspx', data=data)
    open_page = s.get("http://gymnaziumbma.no-ip.org:81/prehled.aspx?s=44&c=prub")

    #Check content
    if page.text == open_page.text:
        print("Same page")
    else:
        print(open_page.text)
        print("Different page!")

You can see all the form data that gets posted in Chrome dev tools.

enter image description here

What is posted above should be enough to get logged in, if not any value you need can be parsed from the login table using BeautifulSoup.

Padraic Cunningham
  • 176,452
  • 29
  • 245
  • 321
  • Thank you a lot. I did not know about the session. When i try to use your solution, i get the same error i had when i was trying to use RoboBrowser: `code: data["__viewstate"] = soup.select_one("#__viewstate")["value"] TypeError: 'NoneType' object is not subscriptable` . Do you have any idea why it can happen? – Daniel Guńka Aug 30 '16 at 16:37
  • Yep, I should have used all capitals for `__VIEWSTATE`, it should work fine now. – Padraic Cunningham Aug 30 '16 at 16:38
  • 1
    Thank you a lot. After adding three another parameters to data it works fine :) – Daniel Guńka Aug 30 '16 at 20:50
  • @PadraicCunningham, Hi, I tried this code but got this error `if page.text == open_page.text: AttributeError: 'str' object has no attribute 'text' ` So I commented those lines and tried `print open_page.content` which printed the content of the **login page** instead of the intended page to open after login, any suggestion please? Edit: printing `open_page` alone gives **Response 200**. – KiDo May 18 '18 at 12:05
  • @KiDo, what site are you trying to login to? – Padraic Cunningham May 18 '18 at 12:22