1

I'm trying to scrape tables of data from different pages on fbref.com using rvest. I've been able to scrape the data from one page using:

library(rvest)
URL <- "https://fbref.com/en/squads/822bd0ba/Liverpool"
WS <- read_html(URL)
passStats <- WS %>% rvest::html_nodes(xpath = '//*[(@id = "ks_sched_all")]') %>% rvest::html_table() %>% data.frame()

but when I try to apply it to multiple pages using a for loop I have a problem as not all of the pages use the same id for the table. Some are "ks_sched_all" but others are "ks_sched_(4-digit number)". Is there any way to just extract any table on the page with an id starting with: "ks_sched_"?

Conor
  • 13
  • 4
  • Have to considered using [starts-with](https://stackoverflow.com/questions/3301898/) in your xpath? – Ian Campbell Jun 01 '20 at 19:24
  • Thanks, I tried that using `xpath = "//*[starts-with(@id, 'ks_sched_')]" ` but then it doesn't scrape it as a table and gives an error of `html_name(x) == "table" is not TRUE` `. Any idea why that's happening? – Conor Jun 01 '20 at 23:32

1 Answers1

1

You can add table to your XPath expression and (). Code could be :

library(rvest)
URL <- "https://fbref.com/en/squads/822bd0ba/Liverpool"
WS <- read_html(URL)


results=list()
i=1

for (tables in 1:length(html_nodes(x = WS,xpath = "//table[starts-with(@id,'ks_sched_')]"))) {
path=paste0('(//table[starts-with(@id,"ks_sched_")])[',i,']')
results[[i]] <- WS %>% html_nodes(xpath = path) %>% html_table() %>% data.frame()
i=i+1
}

We use a for loop, get the number of tables with length, generate a new XPath each time with paste0 and store the results in a list.

Output : list of 7 dataframes

Dataframes

E.Wiest
  • 5,425
  • 2
  • 7
  • 12