Webscraping behind Log-In with X-Auth and Bearer token

Question

I am creating a little script which would save countless hours for me and my colleagues. The thing is I need to get data of my clients from web page based on their number (CLIENT_NO). The whole page is of course behind log in page, but I manually sign in in browser and copy the Bearer and X-Auth tokens which should be enough, to authorize these requests, right?.

Then I use URL "https://moje.csobstavebni-oz.cz/group/nel/vysledky-vyhledavani?searchText=CLIENT_NO" which mimics search request from search bar.This will get me on the desired page. I am looking for data such as "birthNumberIco" and others, as highlighted in screenshot.

A little problem I see is that Request URL is of course different from the one mentioned above. But I cannot use Request URL, because in this URL there is CLIENT_ID not CLIENT_NO and I don't know that.

Unfortunately, I can't get anything from it, Python will always return blank list []. I am suspecting it is because of all the authorization keys and tokens (as you can see in my header, they are of course not written completely for obvious reasons).

I tried several options I found on the Youtube but as of right now, I am completely desperate and I don't know, what else can I do. Maybe there is just some small mistake I did, that will fix the whole thing.

Screenshot screenshot2 screenshot3

Thank you so much in advance!

import scrapy
import json

class KlientUdaje(scrapy.Spider):
    name = 'klient_udaje'
    start_urls = ['https://moje.csobstavebni-oz.cz/group/nel']

    headers = {
        "Accept": "*/*",
        "Accept-Encoding": " gzip, deflate, br",
        "Accept-Language": " en-US,en;q=0.9,cs;q=0.8",
        "Authorization": " Bearer d2ba2XXXXXX",
        "Cache-Control": " no-cache",
        "Connection": "keep-alive",
        "Host": " moje.csobstavebni.cz",
        "Origin": " https://moje.csobstavebni-oz.cz",
        "Pragma": " no-cache",
        "Referer": " https://moje.csobstavebni-oz.cz/",
        "RequestId": " cklydjuq000073q679q5kd2tb",
        "Sec-Fetch-Dest": " empty",
        "Sec-Fetch-Mode": " cors",
        "Sec-Fetch-Site": " cross-site",
        "SystemId": ": 47",
        "User-Agent": "Mozila/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.72 Safari/537.36 Edg/89.0.774.45",
        "X-Auth-Token": "eyAidHlwIjogIkpXVCIsICJraWQiOiAiT2pDY3ErdklKTXXXXX"
        }

    def parse(self, response):
        url = 'https://moje.csobstavebni-oz.cz/group/nel/vysledky-vyhledavani?searchText=CLIENT_NO'

        yield scrapy.Request(url, 
            callback=self.parse_api, 
            headers=self.headers)
    
    def parse_api(self, response):
        raw_data = response.body
        data = json.loads(raw_data)
        rodne_cislo = data['birthNumberIco']
        print(rodne_cislo)

Well the desired website does not work on Chrome and from what I saw, Selenium works primarly with Chrome drivers. — Im_Gas, Mar 07 '21 at 18:58

Webscraping behind Log-In with X-Auth and Bearer token

0 Answers0