I'm trying to scrape multiple pages of sports data using rvest and glue packages. I'm having trouble with the nesting and I think it's because the table from the website has a two line header (some headers are one line some are two). Here's the code I have started with. I checked to make sure the site allowed scraping with python and all good there.
library(tidyverse)
library(rvest) # interacting with html and webcontent
library(glue)
Function to scrape a selected week 1:17 and position 1:4:
salary_scrape_19 <- function(week, position) {
Sys.sleep(3)
cat(".")
url <- glue("https://fantasy.nfl.com/research/scoringleaders?position={position}&sort=pts&statCategory=stats&statSeason=2019&statType=weekStats&statWeek={week}")
read_html(url) %>%
html_nodes("table") %>%
html_table() %>%
purrr::flatten_df() %>%
#set_names(need to clean headers before I can set this)
}
scraped_df <- scaffold %>%
mutate(data = map2(week, position, ~salary_scrape_19(.x, .y)))
scraped_df
Ultimately, I want to build a scrape function to get all the positions with the same columns which are QB, RB, WR, and TE for all weeks in 2019. (want to add a third variable to glue {year} eventually, but need to get this first.
Again, I think the issue has to do with the wonky headers of the table on the site as some are one row and other headings are two rows.