1

I have a large panel data set and I would like to lag and lead a variable by 1 month and 6 business days. I know, for instance, from dplyr there is the lag or lead function. However, I also need to group by data based on the "Names" in the panel data.

My data look like this:

structure(list(Date = c("01.08.2018", "02.08.2018", "03.08.2018", 
"04.08.2018", "05.08.2018", "06.04.2019", "07.04.2019", "08.04.2019", 
"01.08.2018", "02.08.2018", "03.08.2018", "04.08.2018", "06.04.2019", 
"07.04.2019", "08.04.2019", "01.08.2018", "02.08.2018", "03.08.2018", 
"04.08.2018", "05.08.2018", "07.04.2019", "08.04.2019"), Name = c("A", 
"A", "A", "A", "A", "A", "A", "A", "A", "B", "B", "B", "B", "B", 
"B", "C", "C", "C", "C", "C", "C", "C"), Rating = c(1L, 1L, 1L, 
3L, 3L, 4L, 4L, 4L, 3L, 3L, 2L, 2L, 2L, 1L, 1L, 1L, 3L, 3L, 3L, 
5L, 5L, 5L), Size = c(1234L, 24123L, 23L, 1L, 23L, 3L, 23L, 4L, 
323L, 3424L, 523L, 234L, 35L, 354L, 45L, 23L, 46L, 456L, 546L, 
24L, 134L, 1L)), class = "data.frame", row.names = c(NA, -22L
))

It is just a simplified version. My real data lasts from 01.08.2018 to 31.12.2021. How can I only lag and lead the variable called "Rating" by 1 month and 6 business days?

My difficulty is that I have 1 month and 6 business days and not just one variable in the dataframe. All the other variables should not be adjusted.

So far I tried this:

Data_2  <- Data  %>%
  group_by(Name)  %>%
  lag('Rating')
Data_3  <- Data  %>%
  group_by(Name)  %>%
  lead('Rating')

But this is not what I am aiming for.

EDIT:

My output should look like this in the case of lead: (I just used the first 5 rows to illustrate)

structure(list(Date = c("10.09.2018", "11.09.2018", "12.09.2018", 
"13.09.2018", "14.09.2018"), Name = c("A", "A", "A", "A", "A"
), Rating = c(1L, 1L, 1L, 3L, 3L), Size = c("Size from 10.09.2018 would be here", 
"Size from 11.09.2018 would be here", "Size from 12.09.2018 would be here", 
"Size from 13.09.2018 would be here", "Size from 14.09.2018 would be here"
)), class = "data.frame", row.names = c(NA, -5L))

So for row 1 I added 1 month and 6 business days which gives me 10.09.2018 and so on. The "Rating" will then be the one from 01.08.2018 but the "Size" will be the figure that was actually also reported on 10.09.2018. Then, I would like to do the same but go backwards 1 month and 6 business days.

remo
  • 365
  • 1
  • 10
  • 2
    Can you clarify your question please? Can you give an example of the output you want, even if just one or two examples? What does it mean to you to lag the Rating variable -- do you want to replace the value with something matched from approximately 1 year before and/or after? – Jon Spring May 06 '22 at 18:45
  • Thank you Jon Spring for your comment! I updated my question. If you still don't understand it, just let me know. – remo May 06 '22 at 18:57
  • 3
    How exact do you need to be? "1 month and 6 business days" is ambiguous depending on whether that should reflect holidays (whose?) and how you want to treat months of different duration: eg what date is one month after January 31? Is it the same date that is 1 month after January 30 and January 29 (and sometimes January 28)? – Jon Spring May 06 '22 at 19:28
  • 2
    If you can disambiguate the "1 month and 6 business days" problem, then the `slider` package does support something they call "index sliding" which supports "n days". There are perhaps other ways to approach it, but no other package atm that I know of and use. – r2evans May 06 '22 at 19:32
  • @JonSpring, thanks for the comment. 1 month after January 31 should be February 28 or 29 depending on the year if possible and then add the 6 business days. If there is no way to do that, we can try with just 1 month instead or 26 days. – remo May 06 '22 at 19:40
  • @r2evans thank you for the comment. If possible, 1 month and 6 business days would be excellent. Otherwise the assumption of 26 days could work too. – remo May 06 '22 at 19:42
  • @JonSpring Regarding the holidays, we can neglect that. – remo May 06 '22 at 19:42

1 Answers1

1

Here's an approach that would work for "x days later." In this case I use 2 days later to demonstrate on your data, but 35 days later might be good to get the 5 week later #, with same day of week and so should be another "business day" most of the time.

# Convert dates to a date format that can be calculated upon
Data2 <- Data %>% mutate(Date = lubridate::dmy(Date))
         
Data2 %>%
  mutate(Date_future = Date + 2) %>%
  left_join(Data2, by = c("Name", "Date_future" = "Date"), 
            suffix = c("_now", "_future"))
  # pipe into line below to just show selected columns
  # select(Date_future, Name, Rating_now, Size_future)

Result

        Date Name Rating_now Size_now Date_future Rating_future Size_future
1  2018-08-01    A          1     1234  2018-08-03             1          23
2  2018-08-02    A          1    24123  2018-08-04             3           1
3  2018-08-03    A          1       23  2018-08-05             3          23
4  2018-08-04    A          3        1  2018-08-06            NA          NA
5  2018-08-05    A          3       23  2018-08-07            NA          NA
6  2019-04-06    A          4        3  2019-04-08             4           4
7  2019-04-07    A          4       23  2019-04-09            NA          NA
8  2019-04-08    A          4        4  2019-04-10            NA          NA
9  2018-08-01    A          3      323  2018-08-03             1          23
10 2018-08-02    B          3     3424  2018-08-04             2         234
11 2018-08-03    B          2      523  2018-08-05            NA          NA
12 2018-08-04    B          2      234  2018-08-06            NA          NA
13 2019-04-06    B          2       35  2019-04-08             1          45
14 2019-04-07    B          1      354  2019-04-09            NA          NA
15 2019-04-08    B          1       45  2019-04-10            NA          NA
16 2018-08-01    C          1       23  2018-08-03             3         456
17 2018-08-02    C          3       46  2018-08-04             3         546
18 2018-08-03    C          3      456  2018-08-05             5          24
19 2018-08-04    C          3      546  2018-08-06            NA          NA
20 2018-08-05    C          5       24  2018-08-07            NA          NA
21 2019-04-07    C          5      134  2019-04-09            NA          NA
22 2019-04-08    C          5        1  2019-04-10            NA          NA
Jon Spring
  • 55,165
  • 4
  • 35
  • 53
  • Thank you very much for the quick response. I will try it with my original data and will accept the answer if it works. – remo May 06 '22 at 19:53