-1

I'm trying to use R to scrape certain elements from a table on a website. I think I'm just making simple syntax errors, but I can't seem to figure out what I'm doing wrong.

Here's my code:

library(rvest)
library(string)

testlinkurl <- "https://www.pro-football-reference.com/boxscores/202209080ram.htm"
testlinkpage <- read_html(testlinkurl)

Here's what the page looks like:

enter image description here

...and here's the source code for it.

enter image description here

I'm trying to scrape the teams and the quarter scores from the table and put it into a csv. I've tried a few code snippets without success:

row <- html_attr(html_nodes(testlinkpage, xpath="//class="visitor center""))%>%  html_table()
Error: unexpected symbol in "row <- html_attr(html_nodes(testlinkpage, xpath="//class="visitor"

row <- html_attr(html_nodes(testlinkpage, xpath="//*table[@class='statistics']"))
Warning message:
In xml_find_all.xml_node(x, make_selector(css, xpath)) :
  Invalid expression [1207]

row <- html_attr(html_nodes(testlinkpage, xpath='/*[@id="content"]')) %>% html_table()
list()

Any and all input would be appreciated. Thank you!

CJM
  • 47
  • 1
  • 6

1 Answers1

0

Here's one option:

library(rvest)
library(dplyr)


tbl <- rvest::read_html('https://www.pro-football-reference.com/boxscores/202209080ram.htm') %>% 
  rvest::html_elements('.linescore') %>% 
  rvest::html_table()

tbl[[1]] %>% 
  setNames(c('rm', 'Team', "Q1", "Q2", "Q3", "Q4", "Final")) %>% 
  select(-rm)


#> # A tibble: 2 × 6
#>   Team                Q1    Q2    Q3    Q4 Final
#>   <chr>            <int> <int> <int> <int> <int>
#> 1 Buffalo Bills        7     3     7    14    31
#> 2 Los Angeles Rams     0    10     0     0    10

Created on 2023-04-28 by the reprex package (v2.0.1)

Matt
  • 7,255
  • 2
  • 12
  • 34