0

I am trying to login with a script in the website: https://interpals.net/app/auth/login My code is the following

#! /usr/bin/env python
# -*- coding: utf-8 -*-

from requests import Session
from bs4 import BeautifulSoup as bs

with Session() as s:
    site = s.get("https://interpals.net/app/auth/login")
    bs_content = bs(site.content, "html.parser")
    token = bs_content.find("input", {"name":"csrf_token"})["value"]
    login_data = {"username":"user","password":"pass'", "csrf_token":token}
    s.post("https://interpals.net/app/auth/login",login_data)
    home_page = s.get("https://interpals.net/pm.php")

My first trouble is that when I write the parameter "html.parser" I do not get a correct parser, in fact, I do not even get the right html, this is what I got

https://paste.fedoraproject.org/paste/K1SCjKBG7CAigUH4GX7qUQ

When I change "html.parser" to "lxml" or "html5lib" I get indeed a HTML form, which is this

https://paste.fedoraproject.org/paste/4w-AT20kTIXoAmgsPmpqIQ

However, in this last one I do not found the input csrf_token which is what I need in order to login on, anybody could give an advice, please?

Natali Torres
  • 303
  • 1
  • 4
  • 13
  • you want csrf token in your html parsed string? – Tserenjamts Nov 04 '19 at 03:26
  • yes, I want the value of that input since in the console of the HTML of the website exists – Natali Torres Nov 04 '19 at 03:45
  • find the input tag that you want get value by using `.find(FILTER).get('value')` but that input should be submitted right? – Tserenjamts Nov 04 '19 at 03:50
  • @Tserenjamts that is what I am doing on my code, actually – Natali Torres Nov 04 '19 at 03:52
  • from your html i can't find input with tthe `name = 'csrf_token'` did you write that wrong – Tserenjamts Nov 04 '19 at 03:53
  • @Tserenjamts that is my question... if you look in then console of the website, the input does exists, but I cannot get it in my parser – Natali Torres Nov 04 '19 at 03:55
  • So use browser to try to get html like `selenium` or `puppeteer` etc this should be working. and there is also `MechanicalSoup` – Tserenjamts Nov 04 '19 at 03:59
  • https://stackoverflow.com/questions/46942778/python-web-scraping-csrf-token-issue and this could be of use for your case – Tserenjamts Nov 04 '19 at 03:59
  • I see `csrf_token` but not in `
    ` - it is in ``.
    – furas Nov 04 '19 at 04:17
  • @furas Right,, what I do need is to get the input of csrf_token, because it is necessary for login to the page – Natali Torres Nov 04 '19 at 04:20
  • use `BeautifulSoup` with correct arguments in `find()` – furas Nov 04 '19 at 04:22
  • `token = bs_content.find("meta", {"name":"csrf-token"})["content"]` - see `meta` instead of `input`, `-` instead of `_` in `csrf-token` and `content` instead of `value` – furas Nov 04 '19 at 04:26
  • @furas this is useless, it is truth that you can get the value if you get from meta, however for login I need sending via the POST form but the input csrf_token does not appear when I do parser. – Natali Torres Nov 04 '19 at 04:52
  • it is doesn't matter that input doesn't exists. Probably JavaScript adds value to posted values when you press `Submit` on page. But using `requests` you don't use form on page to send it but you send data directly to server. – furas Nov 04 '19 at 05:15
  • @furas I tried so but it does not work, that is why I suppose that csrf_token is actually necessary. – Natali Torres Nov 04 '19 at 23:03
  • `csrf_token` is necessery but don't search it in `` - get it from `` and send it in `requests.post()`. – furas Nov 05 '19 at 00:47
  • @furas I tried so, does not work – Natali Torres Nov 05 '19 at 00:48
  • probably it uses JavaScript to check other elements or to create other cookies. You may have to use [Selenium](http://selenium-python.readthedocs.io/) to control web browser and this way work with this page. – furas Nov 05 '19 at 01:34

0 Answers0