How can I get the link from this table content (I guess it's javascript) ? (Without selenium)

Question

I'm trying to get the href from these table contents, but in the html code is not available. [edited @ 3:44 pm 10/02/2019] I will scrape this site and others similar to this one, on a daily basis and compare with the "yesterday" data. So I get the daily new info in this data. [/edited]

I found a similar (but simpler) solution, but it uses chromedriver (link). I'm looking for a solution that doesn't uses Selenium.

Site: http://web.cvm.gov.br/app/esforcosrestritos/#/detalharOferta?ano=MjAxOQ%3D%3D&valor=MTE%3D&comunicado=MQ%3D%3D&situacao=Mg%3D%3D

If you click in the first parte of the table (as below)

You will get to this site: http://web.cvm.gov.br/app/esforcosrestritos/#/enviarFormularioEncerramento?type=dmlldw%3D%3D&ofertaId=ODc2MA%3D%3D&state=eyJhbm8iOiJNakF4T1E9PSIsInZhbG9yIjoiTVRFPSIsImNvbXVuaWNhZG8iOiJNUT09Iiwic2l0dWFjYW8iOiJNZz09In0%3D

How can I scrape the first site to get all the links it have in the tables? (to go for the second "links")

When I use requests.get it doesn't even get the content of the table. Any help?

link_cvm = "http://web.cvm.gov.br/app/esforcosrestritos/#/detalharOferta?ano=MjAxOQ%3D%3D&valor=MTE%3D&comunicado=MQ%3D%3D&situacao=Mg%3D%3D"
import requests
html_code = requests.get(link_cvm)
html_code.text
print(html_code)

Is this a one-time thing? I only ask because you can easily download all the raw data manually from the the DevTools "Network" tab. — Ayman Safadi, Oct 02 '19 at 17:44
Hi @Ayman, no. I will scrap this site and others similar to this one, on a daily basis and compare with the "yesterday" data. So I get the daily new info in this data. — Felipe Ribeiro, Oct 02 '19 at 18:44
FYI it’s __scrape__ (and __scraping__, __scraper__, __scraped__) not scrap. ‘To scrap’ means to throw away like rubbish :-( — DisappointedByUnaccountableMod, May 18 '21 at 18:28

score 1 · Accepted Answer · answered Oct 02 '19 at 19:21

1

The second page your are taken to is dynamically loaded using jscript. The data you are looking for is contained in another page, in json format. Search around, there is a lot of information about this, for one, of many, example, see this.

In your case, you can get to it this way:

import requests
import json

url = 'http://web.cvm.gov.br/app/esforcosrestritos/enviarFormularioEncerramento/getOfertaPorId/8760'
resp = requests.get(url)

data = json.loads(resp.content)
print(data)

The output is the information on that page.

answered Oct 02 '19 at 19:21

Jack Fleeting

24,385
6
23
45

Tks @JackFleeting what I need is the get the links to these second pages. When I'm in the second page I can get the data. Any ideas? – Felipe Ribeiro Oct 02 '19 at 19:45
@FelipeRibeiro - Click on the link in my answer and read up on using the developer tab in the browser to track down dynamically loaded data. – Jack Fleeting Oct 02 '19 at 19:47
Tks friend. At least what I looked it is just getting the data, but not the "href" to the other pages, at least it was what I understood from the example you sent. Did I get it right? Tks a lot for your time and patience, best. – Felipe Ribeiro Oct 02 '19 at 19:59
@FelipeRibeiro - It's NOT a simple process, so you have a lot to learn... Try this too: https://ianlondon.github.io/blog/web-scraping-discovering-hidden-apis/. Also, don't forget to accept the answer. – Jack Fleeting Oct 02 '19 at 20:16
Tks @Jack. I'm looking foward to do it. Very complex ideed. If I found other solutions I'll post here. – Felipe Ribeiro Oct 03 '19 at 17:11
Hi @Jack I posted another question here https://stackoverflow.com/questions/58341926/how-to-get-the-last-table-from-this-site-python , the problem of this question here I solved in a more "manual" way. But now I need help in order to get the last table info (is not in json as I could see). Tks in advance. – Felipe Ribeiro Oct 11 '19 at 13:08

How can I get the link from this table content (I guess it's javascript) ? (Without selenium)

1 Answers1

Linked