Scraping similarly named tables using rvest

Question

I'm trying to scrape tables of data from different pages on fbref.com using rvest. I've been able to scrape the data from one page using:

library(rvest)
URL <- "https://fbref.com/en/squads/822bd0ba/Liverpool"
WS <- read_html(URL)
passStats <- WS %>% rvest::html_nodes(xpath = '//*[(@id = "ks_sched_all")]') %>% rvest::html_table() %>% data.frame()

but when I try to apply it to multiple pages using a for loop I have a problem as not all of the pages use the same id for the table. Some are "ks_sched_all" but others are "ks_sched_(4-digit number)". Is there any way to just extract any table on the page with an id starting with: "ks_sched_"?

Have to considered using [starts-with](https://stackoverflow.com/questions/3301898/) in your xpath? — Ian Campbell, Jun 01 '20 at 19:24
Thanks, I tried that using `xpath = "//*[starts-with(@id, 'ks_sched_')]" ` but then it doesn't scrape it as a table and gives an error of `html_name(x) == "table" is not TRUE` `. Any idea why that's happening? — Conor, Jun 01 '20 at 23:32

score 1 · Accepted Answer · answered Jun 02 '20 at 03:23

You can add table to your XPath expression and (). Code could be :

library(rvest)
URL <- "https://fbref.com/en/squads/822bd0ba/Liverpool"
WS <- read_html(URL)


results=list()
i=1

for (tables in 1:length(html_nodes(x = WS,xpath = "//table[starts-with(@id,'ks_sched_')]"))) {
path=paste0('(//table[starts-with(@id,"ks_sched_")])[',i,']')
results[[i]] <- WS %>% html_nodes(xpath = path) %>% html_table() %>% data.frame()
i=i+1
}

We use a for loop, get the number of tables with length, generate a new XPath each time with paste0 and store the results in a list.

Output : list of 7 dataframes

Scraping similarly named tables using rvest

1 Answers1