0

I am trying to download several tables from this website using rvest and SelectorGadget.

The css selector is "#main li" as can be seen from the screenshot below.

enter image description here

When I run the following code, unfortunately an empty table results.

library(rvest)

psh <- read_html("https://pershingsquareholdings.com/performance/net-asset-value-and-returns/")

psh.node <- html_node(psh, "#main li") 

psh.table = html_table(psh, fill = TRUE)

I guess the site prevents scraping, but it would otherwise be great if an alternative way could be recommended to get the data.

Thanks in advance!

1 Answers1

1

Problem is that it is not an html <table> but a list:

library(rvest)
library(purrr)

psh <- read_html("https://pershingsquareholdings.com/performance/net-asset-value-and-returns/")

psh.node <- html_node(psh, "#main .psh_table") 

headers <- psh.node %>% html_element(".psh_table_row.headings") %>% 
  html_elements("li") %>% html_text()

table <-  psh.node %>% html_elements("ul.psh_table_row")  %>%
  map_dfr(~ html_elements(., "li") %>% html_text() %>% set_names(headers))

table
#> # A tibble: 44 × 10
#>    `As of Date` Period  USDNAV…¹ Euron…² GBPNA…³ LSE G…⁴ LSE U…⁵ MTDRe…⁶ QTDRe…⁷
#>    <chr>        <chr>   <chr>    <chr>   <chr>   <chr>   <chr>   <chr>   <chr>  
#>  1 As of Date   Period  USDNAV/… Eurone… GBPNAV… LSE GB… LSE US… MTDRet… QTDRet…
#>  2 25 October   Weekly  $51.01   $32.60  £44.47  £28.45  $32.63  12.0%   12.0%  
#>  3 18 October   Weekly  $47.81   $30.25  £42.22  £26.70  $30.23  5.0%    5.0%   
#>  4 11 October   Weekly  $44.96   $29.35  £40.93  £26.35  $29.55  -1.3%   -1.3%  
#>  5 30 September Monthly $45.55   $30.00  £40.79  £27.00  $30.25  -4.5%   8.2%   
#>  6 27 September Weekly  $45.49   $30.25  £42.44  £28.00  $30.23  -4.6%   8.1%   
#>  7 20 September Weekly  $47.88   $32.00  £42.06  £27.80  $31.75  0.4%    13.8%  
#>  8 13 September Weekly  $49.11   $32.00  £42.71  £27.70  $32.13  2.7%    16.4%  
#>  9 6 September  Weekly  $47.80   $32.25  £41.51  £28.10  $32.48  -0.1%   13.3%  
#> 10 31 August    Monthly $47.83   $32.70  £41.17  £27.90  $32.83  2.9%    13.4%  
#> # … with 34 more rows, 1 more variable: YTDReturn <chr>, and abbreviated
#> #   variable names ¹​`USDNAV/Share`, ²​`EuronextPrice/Share`, ³​`GBPNAV/Share`,
#> #   ⁴​`LSE GBPPrice/Share`, ⁵​`LSE USDPrice/Share`, ⁶​MTDReturn, ⁷​QTDReturn

Edit

To find all similar tables:

psh.nodes <- html_elements(psh, "#main .psh_table") 

tables <- map(psh.nodes, function(psh.node){
  headers <- psh.node %>% html_element(".psh_table_row.headings") %>% 
    html_elements("li") %>% html_text()
  
  table <-  psh.node %>% html_elements("ul.psh_table_row")  %>%
    map_dfr(~ html_elements(., "li") %>% html_text() %>% set_names(headers))
  
  table
})
Ric
  • 5,362
  • 1
  • 10
  • 23