0

I'm trying to scrape multiple pages of sports data using rvest and glue packages. I'm having trouble with the nesting and I think it's because the table from the website has a two line header (some headers are one line some are two). Here's the code I have started with. I checked to make sure the site allowed scraping with python and all good there.

library(tidyverse) 
library(rvest) # interacting with html and webcontent
library(glue)

webpage: https://fantasy.nfl.com/research/scoringleaders?position=1&sort=pts&statCategory=stats&statSeason=2019&statType=weekStats&statWeek=1

Function to scrape a selected week 1:17 and position 1:4:

salary_scrape_19 <- function(week, position) {

Sys.sleep(3)  

cat(".")

url <- glue("https://fantasy.nfl.com/research/scoringleaders?position={position}&sort=pts&statCategory=stats&statSeason=2019&statType=weekStats&statWeek={week}")
read_html(url) %>% 
    html_nodes("table") %>% 
    html_table() %>%
    purrr::flatten_df() %>% 
    #set_names(need to clean headers before I can set this)
}

scraped_df <- scaffold %>% 
mutate(data = map2(week, position, ~salary_scrape_19(.x, .y))) 

scraped_df

Ultimately, I want to build a scrape function to get all the positions with the same columns which are QB, RB, WR, and TE for all weeks in 2019. (want to add a third variable to glue {year} eventually, but need to get this first.

Again, I think the issue has to do with the wonky headers of the table on the site as some are one row and other headings are two rows.

Zoe
  • 27,060
  • 21
  • 118
  • 148
Jeff Henderson
  • 643
  • 6
  • 10

1 Answers1

0

We can paste 1st row as column names to original columns and then remove that row.

library(tidyverse)
library(rvest)

salary_scrape_19 <- function(week, position) {

  url <- glue::glue("https://fantasy.nfl.com/research/scoringleaders?position={position}&sort=pts&statCategory=stats&statSeason=2019&statType=weekStats&statWeek={week}")
  read_html(url) %>% 
    html_nodes("table") %>% 
    html_table() %>%
    .[[1]] %>%
    set_names(paste0(names(.), .[1, ])) %>%
    slice(-1) 
}

We can then use map2 to scrape the data for different week and position.

Trying it on sample data

scaffold <- data.frame(week = c(1, 2), position = c(1, 2))
scraped_df <- scaffold %>% mutate(data = map2(week, position, salary_scrape_19))
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213