Trouble with parser a website with BeautifulSoup

Question

I am trying to login with a script in the website: https://interpals.net/app/auth/login My code is the following

#! /usr/bin/env python
# -*- coding: utf-8 -*-

from requests import Session
from bs4 import BeautifulSoup as bs

with Session() as s:
    site = s.get("https://interpals.net/app/auth/login")
    bs_content = bs(site.content, "html.parser")
    token = bs_content.find("input", {"name":"csrf_token"})["value"]
    login_data = {"username":"user","password":"pass'", "csrf_token":token}
    s.post("https://interpals.net/app/auth/login",login_data)
    home_page = s.get("https://interpals.net/pm.php")

My first trouble is that when I write the parameter "html.parser" I do not get a correct parser, in fact, I do not even get the right html, this is what I got

https://paste.fedoraproject.org/paste/K1SCjKBG7CAigUH4GX7qUQ

When I change "html.parser" to "lxml" or "html5lib" I get indeed a HTML form, which is this

https://paste.fedoraproject.org/paste/4w-AT20kTIXoAmgsPmpqIQ

However, in this last one I do not found the input csrf_token which is what I need in order to login on, anybody could give an advice, please?

yes, I want the value of that input since in the console of the HTML of the website exists — Natali Torres, Nov 04 '19 at 03:45
find the input tag that you want get value by using `.find(FILTER).get('value')` but that input should be submitted right? — Tserenjamts, Nov 04 '19 at 03:50
from your html i can't find input with tthe `name = 'csrf_token'` did you write that wrong — Tserenjamts, Nov 04 '19 at 03:53
@Tserenjamts that is my question... if you look in then console of the website, the input does exists, but I cannot get it in my parser — Natali Torres, Nov 04 '19 at 03:55
So use browser to try to get html like `selenium` or `puppeteer` etc this should be working. and there is also `MechanicalSoup` — Tserenjamts, Nov 04 '19 at 03:59
https://stackoverflow.com/questions/46942778/python-web-scraping-csrf-token-issue and this could be of use for your case — Tserenjamts, Nov 04 '19 at 03:59
@furas Right,, what I do need is to get the input of csrf_token, because it is necessary for login to the page — Natali Torres, Nov 04 '19 at 04:20
`token = bs_content.find("meta", {"name":"csrf-token"})["content"]` - see `meta` instead of `input`, `-` instead of `_` in `csrf-token` and `content` instead of `value` — furas, Nov 04 '19 at 04:26
@furas this is useless, it is truth that you can get the value if you get from meta, however for login I need sending via the POST form but the input csrf_token does not appear when I do parser. — Natali Torres, Nov 04 '19 at 04:52
it is doesn't matter that input doesn't exists. Probably JavaScript adds value to posted values when you press `Submit` on page. But using `requests` you don't use form on page to send it but you send data directly to server. — furas, Nov 04 '19 at 05:15
@furas I tried so but it does not work, that is why I suppose that csrf_token is actually necessary. — Natali Torres, Nov 04 '19 at 23:03
`csrf_token` is necessery but don't search it in `` - get it from `` and send it in `requests.post()`. — furas, Nov 05 '19 at 00:47
probably it uses JavaScript to check other elements or to create other cookies. You may have to use [Selenium](http://selenium-python.readthedocs.io/) to control web browser and this way work with this page. — furas, Nov 05 '19 at 01:34

Trouble with parser a website with BeautifulSoup

0 Answers0