0

I'm trying to scrape data from the ASX (Australian Stock Exchange) site. For example, on BHP on ASX, at the bottom of the page is a collection of fundamentals data. The selector for the values, eg eps, is:

#company_key_statistics > div > div.panel-body.row > div:nth-child(3) > table > tbody > tr:nth-child(8) > td

I tried

library(rvest)
ASX_bhp <-read_html("https://www2.asx.com.au/markets/company/bhp")
ASX_data <- ASX_bhp |> html_elements("td") |> html_text()

or instead of "td", I have tried "tr", "#company_key_statistics", or the whole selector string. However, all return an empty character. I also tried html_nodes instead of html_elements.

How should I extract fundamental data from this site?

Rubén
  • 34,714
  • 9
  • 70
  • 166
Isaiah
  • 2,091
  • 3
  • 19
  • 28

1 Answers1

1

All that data is fetched and presented through JavaScript, thus it's not available for rvest (at least not through that URL). But you can user their API:

library(jsonlite)
bhp <- fromJSON("https://asx.api.markitdigital.com/asx-research/1.0/companies/bhp/key-statistics")
bhp$data$earningsPerShare
#> [1] 5.95708

Created on 2022-09-19 with reprex v2.0.2

margusl
  • 7,804
  • 2
  • 16
  • 20
  • Thanks so much @margusl! How could you tell that all that data is fetched and presented through JavaScript? – Isaiah Sep 18 '22 at 22:40
  • 1
    You could just js for that site, but it's also easy to spot from network tab of browser dev tools. And instead of Element Inspector it makes sense to view actual source and search for text values you'd expect to find in rendered tables. And sometimes it's just good idea to just check what rvest can actually retrieve, i.e saving page content to a file and taking a closer look with editor and/or browser: ```r library(rvest) ASX_bhp <-read_html("https://www2.asx.com.au/markets/company/bhp") tmp <- tempfile("raw", fileext = ".html") message(tmp) write(as.character(ASX_bhp),tmp) ``` – margusl Sep 18 '22 at 23:09
  • 1
    Mh, it suposed to read "disable JS for that site" in previous comment. – margusl Sep 19 '22 at 05:54