10

I have a dataframe containing a set of variables that I want to lag at different lenghts so that I can use them in regressions later on (instead of lagging one variable at a time manually).

I found this code on Stackoverflow that seems to do the trick:

df = data.frame(a = 1:10, b = 21:30)
dplyr::mutate_all(df, lag)
    a  b
1  NA NA
2   1 21
3   2 22
4   3 23
5   4 24
6   5 25
7   6 26
8   7 27
9   8 28
10  9 29

The problem is that this lags every column and I have some columns that I don't want to be lagged. How do I adapt the above code so that the columns I don't want to be lagged are excluded? And also how do i lag a different lenghts, now it only lags by 1 as the default setting

Andycode
  • 171
  • 1
  • 1
  • 10

2 Answers2

18

I keep googling up this same Q&A and then noting that mutate_at() and mutate_if() are now superceded by across(), which provides a slightly easier-to-remember approach for the "mutate all except these columns" pattern

df = data.frame(a = 1:10, b = 21:30, c=31:40, d=41:50)
> df
    a  b  c  d
1   1 21 31 41
2   2 22 32 42
3   3 23 33 43
4   4 24 34 44
5   5 25 35 45
6   6 26 36 46
7   7 27 37 47
8   8 28 38 48
9   9 29 39 49
10 10 30 40 50
> # everythng but columns b and c
> df %>% mutate(across(!b & !c, lag))
    a  b  c  d
1  NA 21 31 NA
2   1 22 32 41
3   2 23 33 42
4   3 24 34 43
5   4 25 35 44
6   5 26 36 45
7   6 27 37 46
8   7 28 38 47
9   8 29 39 48
10  9 30 40 49
mac
  • 3,137
  • 1
  • 28
  • 42
4

Have a look at mutate_at or mutate_if

library(dplyr)
df = tibble(a = LETTERS[1:10], b = 21:30,c=31:40)

#exclude column a
df %>% 
  mutate_at(vars(-("a")),lag)
#> # A tibble: 10 x 3
#>    a         b     c
#>    <chr> <int> <int>
#>  1 A        NA    NA
#>  2 B        21    31
#>  3 C        22    32
#>  4 D        23    33
#>  5 E        24    34
#>  6 F        25    35
#>  7 G        26    36
#>  8 H        27    37
#>  9 I        28    38
#> 10 J        29    39
#only column b
df %>% 
  mutate_at(c("b"),lag,4)
#> # A tibble: 10 x 3
#>    a         b     c
#>    <chr> <int> <int>
#>  1 A        NA    31
#>  2 B        NA    32
#>  3 C        NA    33
#>  4 D        NA    34
#>  5 E        21    35
#>  6 F        22    36
#>  7 G        23    37
#>  8 H        24    38
#>  9 I        25    39
#> 10 J        26    40
#only character column
df %>% 
  mutate_if(is.character,lag,3)
#> # A tibble: 10 x 3
#>    a         b     c
#>    <chr> <int> <int>
#>  1 <NA>     21    31
#>  2 <NA>     22    32
#>  3 <NA>     23    33
#>  4 A        24    34
#>  5 B        25    35
#>  6 C        26    36
#>  7 D        27    37
#>  8 E        28    38
#>  9 F        29    39
#> 10 G        30    40

Created on 2020-04-20 by the reprex package (v0.3.0)

Frank Zhang
  • 1,670
  • 7
  • 14
  • I tried #only column b df %>% mutate_at(c("b"),lag,4) for my desired columns, but it doesn't seem to do anything at all. The new dataframe I assigned to this code looks exactly as the one prior to applying the cod (lags) – Andycode Apr 20 '20 at 12:34
  • @Andycode its better to show your real dataset and the codes you tried so far in your question. – Frank Zhang Apr 20 '20 at 12:36
  • macro_2 <- macro %>% mutate_at(c("inc_ldiff","unem_ldiff", "hp_ldiff", "int_diff", "m1_ldiff"),lag,2) This was the code I applied. Macro is my dataframe for macrovariables with a date column, quarter column, year column and some dummy columns. Yet, it didn't do anything to the selected columns in my dataframe. – Andycode Apr 20 '20 at 13:14
  • @Andycode, can you try to change `lag` to `dplyr::lag` to see does this help? – Frank Zhang Apr 20 '20 at 13:22