Scrape dynamic web page with Python (input dates)

Question

I'm trying to find a way to iterate through dates for a large period of time. The site is: https://www.nnbulgaria.com/life-insurance/insurance-plans/investment-insurance-nn-pro/value-of-investment-unit and there is a table in it with specific values for each date (begins on 06/01/2017, formated MM/DD/YYYY). With different date input the table changes, so I need to be able to loop through dates or a range of dates, and then extract the table data. (There is also a graph with all the values, but I can't find the dynamic content in the page source)

The scraped data may be formatted or not (it's on separate td tags), but I can reshape it once it's downloaded. So far I read about options with selenium, but I don't have Chrome installed, so I'm looking for other ways. Help is appreciated.

if it use JavaScript to read data then first you should check in DevTools (in Chrome/Firefox, tab "Network") all requests to server and data in responses. Maybe you find your data in some response and then you can use this url to get data without `Selenium`. Oftern you can get it as JSON data which can easily converted to Python dictionary or list (using module `json`) — furas, Apr 22 '20 at 22:23
this page read some data from url https://www.nnbulgaria.com/Orchard.Nn/public/chartsUVData?chart-startdate=2004-06-01&chart-enddate=2020-04-23&value-per-share-type=LiPro — furas, Apr 22 '20 at 22:26
url in previous comment has dates `'chart-startdate'`, `'chart-enddate'` so you put dates directly in url and you don't need `'POST'`form. — furas, Apr 22 '20 at 22:30
@furas brilliant, thx! I'm curious how u dew it tho. :) This appears to have all the value in JSON format. But I'll cope. :) — Julian, Apr 22 '20 at 22:42
I descirebed in answer how I found this url . Module `requests` can convert `JSON` data to Python dictionary/list - `data = response.json()` - and you have data without HTML scraping — furas, Apr 22 '20 at 22:45
Found it, thx! I had to reload the page and see the preview, plenty of stuff loading. Thx again! — Julian, Apr 22 '20 at 23:06
there are buttons (filters) to display only some requests - button/filter `XHR` shows only `AJAX` requests. — furas, Apr 22 '20 at 23:11

furas · Answer 1 · 2020-04-22T22:43:28.430

This page uses JavaScript/AJAX (XHR)

Using DevTool in Chrome of Firefox (tab: Network, filter: XHR) you can see all requests from JavaScript to server and all data in responses.

This way you can see it reads some data from url:

https://www.nnbulgaria.com/Orchard.Nn/public/chartsUVData?chart-startdate=2004-06-01&chart-enddate=2020-04-23&value-per-share-type=LiPro

and it gets JSON data which you can easily convert to Python dictionary.

In url you can see date chart-startdate= and enddate= so if you change dates then you should get different data - and you don't need to use POST form for this.

And it doesn't need to use Selenium

import requests

url = 'https://www.nnbulgaria.com/Orchard.Nn/public/chartsUVData'

params = {
    'chart-startdate': '2004-06-01',
    'chart-enddate': '2020-04-23',
    'value-per-share-type': 'LiPro',
}

r = requests.get(url, params=params)
data = r.json()

print(data.keys())

for label, lowrisk, balanced in zip(data['labels'], data['dataLowRisk'], data['dataBalanced']):
    print(label, lowrisk, balanced)

Result

dict_keys(['labels', 'dataLowRisk', 'dataBalanced', 'dataAggressive', 'dataCommodities', 'dataMoneyMarket', 'dataUSEquities', 'dataGermanEquities', 'dataTechnologyCompaniesEquities'])

02.06.2017 1.0 0.99434
08.06.2017 0.9999 0.99387
14.06.2017 1.00092 0.99564
20.06.2017 1.0059 1.00039
26.06.2017 1.00375 0.99676
30.06.2017 0.99521 0.98354
06.07.2017 0.9932 0.98518
12.07.2017 0.99384 0.98384
18.07.2017 1.00056 0.9944
24.07.2017 0.99827 0.99075

Scrape dynamic web page with Python (input dates)

1 Answers1