I am working with some tennis ranking data in R, that gives the evolution of tennis rankings over time of all the players on the ATP tour.
An example of the data I am using can be found here, giving the rankings data from 2000's: https://github.com/JeffSackmann/tennis_atp/blob/master/atp_rankings_00s.csv
To clean up the data:
rankings <- read_csv("data/atp/atp_rankings_00s.csv")
rankings = rankings %>%
mutate(rankingDate = lubridate::ymd(ranking_date) ) %>%
select(-ranking_date)
Now, suppose I wish to trace the time evolution of each player over the entire decade, and calculate their mean ranking during this period. Then I can write:
rankings %>%
group_by(player) %>%
summarise(
meanRanking = mean(rank, na.rm = TRUE),
)
However, suppose I want something more. I want to slice up this data along the time axis, and calculate the mean ranking for these slices. Thus, with something like start=01-01-2000, end=01-01-2008, skip=2 years
, I could have mean rankings over 2-year time windows for the period from 1 Jan 2000 to 1 Jan 2008. How would one code such a 'time slicing` in R?