2

I'm scraping the following website Scorebing using requests. In order to do so, I'm exploring the website to locate the XHR calls and get an url like this

page position being the code as follows

import requests,json

header={some data from the XHR I got using Postman}
url='https://lv.scorebing.com/ajax/score/data?mt=0&nr=1&corner=1'

response=requests.get(url=url,headers=header,data=json.dumps({}))
response.json()

No problems there. My problem is that if I switch tab, like from Corner to Fixture, no new XHR is called. In fact, only "Live Matches" and "Corners" allows for this direct XHR connection. I see that some js scripts are loaded, but I can't go from there to replicating my previous step.

new page position

I know I can scrape this using selenium, and probably using a regular requests to the url of the page and using BSoup, but what I don't understand is why some tabs make XHR calls to load data where other similar ones use js. I would like to know how can you reverse engineer those js calls in order to get an API similar to the first part.

jizhihaoSAMA
  • 12,336
  • 9
  • 27
  • 49
puppet
  • 707
  • 3
  • 16
  • 33

1 Answers1

1

Firstly,you should know that XHR(XMLHttpRequest) in Chrome will record all the ajax request.


What's Ajax?

Ajax is a set of web development techniques using many web technologies on the client side to create asynchronous web applications.

Ajax could be achieved by JavaScript or jQuery(Well,jQuery is a JavaScript library.It is JavaScipt essentially,but jQuery offer a API about ajax).

In your example page,there are many ajax requests in the source code: enter image description here

enter image description here


I would like to know how can you reverse engineer those js calls in order to get an API similar to the first part.

If you really want to do it just by the source code,you should:

  1. Send a GET request to the page.
  2. Analysis the source code of the page,then iterate each Javascript.(Also send GET request.)
  3. Find all the ajax requests and also send GET requests,select the data you need from them.
jizhihaoSAMA
  • 12,336
  • 9
  • 27
  • 49
  • You mean to use the url on your picture, that is `url='lv.scorebing.com/assets/js/comment.js?_31'` and do a requests.get? I'm almost new to anything related to javascript besides selenium – puppet May 21 '20 at 18:44
  • @puppet All the url about `js`,not only the `comment.js`. – jizhihaoSAMA May 22 '20 at 03:12
  • Sorry, I know this might seems trivial to you, but as I said, I know nothing of JS. You mean all as in, you need to requests every single js in order to get anything? If so, how would you concatenate between requests? Or, you mean I need to request every js that loads parts of the data that are significant to me, independently of each other – puppet May 22 '20 at 21:38
  • Again,`XHR` will record `ajax` request.The data you need will not be load in the source code.When you load the page in the browser,it will execute the `JavaScript`.Due to it has `Ajax` request in the `JavaScript`,it will also send requests to those `ajax` url.(For example,one of the url you have find it is `'https://lv.scorebing.com/ajax/score/data?mt=0&nr=1&corner=1'`).Then it will be put in the page. – jizhihaoSAMA May 23 '20 at 02:49