1

I'm fairly new to rolling windows. I'm looking to calculate a function that compares, say, a correlation between a window in the data vs. all windows before/after of the same size. Assume no gaps. I'd like to use a tidyverse-sque approach such as tsibble and/or @Davis Vaughan slider

enter image description here

df <- structure(list(sales = c(2, 4, 6, 2, 8, 10, 9, 3, 5, 2), index = structure(c(1567123200, 1567209600, 1567296000, 1567382400, 1567468800, 1567555200, 1567641600, 1567728000, 1567814400, 1567900800), class = c("POSIXct", "POSIXt"), tzone = "UTC")), row.names = c(NA, -10L), class = ("tbl_df", "tbl", "data.frame"))

Suppose I want to calculate the Pearson correlation between the first 3 days of the series vs. all previous 3 days windows:

enter image description here

Thomas Speidel
  • 1,369
  • 1
  • 14
  • 26

1 Answers1

1

We could create a grouping index with gl for every 3 rows after removing the first 3 rows, then do the cor between the first 3 and each of the blocks of 'sales'

library(dplyr)
n <- 3
df %>%
    slice(-seq_len(n)) %>% 
    group_by(grp = as.integer(gl(n(), n, n()))) %>% 
    filter(n() == n) %>%
    summarise(cor = cor(df$sales[seq_len(n)], sales))

-output

# A tibble: 2 x 2
#    grp    cor
#  <int>  <dbl>
#1     1  0.961
#2     2 -0.655

data

df <- data.frame(sales = c(2, 4, 6, 2, 8, 10, 9, 3, 5, 2),
  index = seq(as.Date("2019-08-30"), length.out = 10, by = '1 day'))
akrun
  • 874,273
  • 37
  • 540
  • 662
  • Thanks this is clever and simple! How would you modify `slice(-seq_len(n))` so that one can more easily specify a specific interval in a series (e.g. `filter(between(date, as.Date("2019-08-30"), as.Date("2019-09-01")))` – Thomas Speidel Oct 12 '20 at 01:17
  • 1
    @ThomasSpeidel If you are using `filter` then negate (`!`) – akrun Oct 12 '20 at 23:51
  • Yes, thanks. To make it more manageable, I filtered out the interval of interest:`pattern <- df %>% filter(between(date, as.Date("2014-06-05"), as.Date("2014-06-23"))) df2 <- df %>% anti_join(pattern, by = "date") %>%` – Thomas Speidel Oct 13 '20 at 14:55
  • You could do this in a single chain `df %>% filter(between(date, as.Date("2014-06-05"), as.Date("2014-06-23"))) %>% anti_join(df, ., by = 'date')` – akrun Oct 13 '20 at 23:21