0

I hava a very large dataframe with multiple readings (multiple columns) of more than 100 countries for 98 days each.

There are several countries, several features and several dates in the real dataframe. I created a sample to illustrate my challenge. Also the dates are ordered as presented.

Sample data set

#Create sample dataframe
df <- data.frame(Date = seq(ymd("2020-01-01",tz= Sys.timezone()),ymd("2020-01-05",tz= Sys.timezone()),86400),
                 Country=c(rep("USA",5),rep("INDIA",5),rep("POLAND",5),rep("SWITZERLAND",5)),
                 var1= seq(1,20,1),var2= seq(1,20,1),var3=seq(1,20,1),val_n = seq(1,20,1))

df
         Date     Country var1 var2 var3 val_n
1  2020-01-01         USA    1    1    1     1
2  2020-01-02         USA    2    2    2     2
3  2020-01-03         USA    3    3    3     3
4  2020-01-04         USA    4    4    4     4
5  2020-01-05         USA    5    5    5     5
6  2020-01-01       INDIA    6    6    6     6
7  2020-01-02       INDIA    7    7    7     7
8  2020-01-03       INDIA    8    8    8     8
9  2020-01-04       INDIA    9    9    9     9
10 2020-01-05       INDIA   10   10   10    10
11 2020-01-01      POLAND   11   11   11    11
12 2020-01-02      POLAND   12   12   12    12
13 2020-01-03      POLAND   13   13   13    13
14 2020-01-04      POLAND   14   14   14    14
15 2020-01-05      POLAND   15   15   15    15
16 2020-01-01 SWITZERLAND   16   16   16    16
17 2020-01-02 SWITZERLAND   17   17   17    17
18 2020-01-03 SWITZERLAND   18   18   18    18
19 2020-01-04 SWITZERLAND   19   19   19    19
20 2020-01-05 SWITZERLAND   20   20   20    20

Objective The goal is to have the next 4day values of each reading as new columns as seen below. Where there are no next values, NAs should be populated.

expected output

    Date     Country var1 var2 var3 var_n var1_next_1day var1_next_2day var1_next_3day var1_next_4day Same_var2_n
1  1-Jan         USA    1    1    1     1              2              3              4              5           …
2  2-Jan         USA    2    2    2     2              3              4              5             NA           …
3  3-Jan         USA    3    3    3     3              4              5             NA             NA           …
4  4-Jan         USA    4    4    4     4              5             NA             NA             NA           …
5  5-Jan         USA    5    5    5     5             NA             NA             NA             NA           …
6  1-Jan       INDIA    6    6    6     6              7              8              9             10           …
7  2-Jan       INDIA    7    7    7     7              8              9             10             NA           …
8  3-Jan       INDIA    8    8    8     8              9             10             NA             NA           …
9  4-Jan       INDIA    9    9    9     9             10             NA             NA             NA           …
10 5-Jan       INDIA   10   10   10    10             NA             NA             NA             NA           …
11 1-Jan      POLAND   11   11   11    11             12             13             14             15           …
12 2-Jan      POLAND   12   12   12    12             13             14             15             NA           …
13 3-Jan      POLAND   13   13   13    13             14             15             NA             NA           …
14 4-Jan      POLAND   14   14   14    14             15             NA             NA             NA           …
15 5-Jan      POLAND   15   15   15    15             NA             NA             NA             NA           …
16 1-Jan SWITZERLAND   16   16   16    16             17             18             19             20           …
17 2-Jan SWITZERLAND   17   17   17    17             18             19             20             NA           …
18 3-Jan SWITZERLAND   18   18   18    18             19             20             NA             NA           …
19 4-Jan SWITZERLAND   19   19   19    19             20             NA             NA             NA           …
20 5-Jan SWITZERLAND   20   20   20    20             NA             NA             NA             NA           …
James Z
  • 12,209
  • 10
  • 24
  • 44
Yinkafad
  • 41
  • 5

2 Answers2

1

In dplyr, we can use lead with mutate_at to apply it for multiple columns.

library(dplyr)

df %>%
  group_by(Country) %>%
  mutate_at(vars(var1:val_n), list(next_1day = ~lead(.), next_2day = ~lead(., 2),
                               next_3day = ~lead(., 3), next_4day = ~lead(., 4)))
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
0
library(data.table)
df <- data.table(df)
#one day
df[,var1_next_1day:=shift(var1,-1),by="Country"]
#second day 
df[,var2_next_2day:=shift(var2,-2),by="Country"]
Miss Chanandler Bong
  • 4,081
  • 10
  • 26
  • 36
Prahlad
  • 118
  • 1
  • 4
  • Thanks Prehlad. But I get this error Error in eval(jsub, setattr(as.list(seq_along(x)), "names", names_x), : object 'var2_next_2day' not found – Yinkafad Apr 12 '20 at 09:58