I have a function which returns numerous attributes of an athlete, one of them being their birth date, through web scraping off the official IAAF athletics page. I've modified it slightly for the purposes of this question:
upscope_list <- list()
library(xml2)
library(tidyverse)
library(stringi)
library(rvest)
scrape_function_mod <- function(athlete_name) {
starting_name <- stri_trans_general(athlete_name, "latin-ascii")
initial_url <-
paste0("https://www.iaaf.org/athletes/search?query=", starting_name)
initial_search_page <- read_html(initial_url)
rawnodes_text <-
initial_search_page %>% html_nodes("table td") %>% html_text(trim = T) %>% stri_trans_general("latin-ascii")
name_split <- as_vector(strsplit(starting_name, " ", fixed = T))
number <- which(sapply(rawnodes_text, function(x)
grepl(name_split[1], x, ignore.case = T) &
grepl(name_split[length(name_split)], x, ignore.case = T)))
upscope_list[[athlete_name]][["birth_date"]] <<-
rawnodes_text[(number + 4)] %>% as.Date("%d %B %Y")
return(rawnodes_text[(number + 4)] %>% as.Date("%d %B %Y"))
}
Most of the function code isn't that important except for the last two lines. If I run:
> scrape_function_mod("Ashton Eaton")
[1] "1988-01-21"
This returns a proper Date object of the athlete's birth date, however the value which I insert into the list created at the start differs, by returning a numeric four digit number which I can't make sense of.
> upscope_list[["Ashton Eaton"]][["birth_date"]]
[1] 6594
You can see that what I assign to the list compared to what I return should be virtually identical but it's not. Any tips to get it to convert properly inside the function?