Assuming I have a panel data as follows, which was edited from this link:
df <- structure(list(id = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("M01",
"M02", "S01"), class = "factor"), date = structure(c(2L, 3L,
4L, 5L, 6L, 7L, 8L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 1L, 2L, 3L, 4L,
5L, 6L, 7L, 8L), .Label = c("2020-12", "2021-01", "2021-02",
"2021-03", "2021-04", "2021-05", "2021-06", "2021-07"), class = "factor"),
actual = c(3.4, 5.4, 7.4, 7.4, 7.5, 8, 8.9, 10.8, 10.1, 8.2,
10.1, 9.4, 10.1, 9.4, -0.3, NA, NA, 8.6, NA, NA, 8.3, NA),
pred = c(3.288889774, 5.819407687, 6.705608369, 6.054457292,
5.582409131, 7.01052472, 9.742902434, 10.98571396, 6.522003651,
9.688977242, 10.39801463, 9.398991615, 9.764616936, 9.855033457,
0.493311422, 8.403722942, 8.174854517, 8.573117852, 8.403065801,
8.684289455, 8.719079247, 8.259439468)), class = "data.frame", row.names = c(NA,
-22L))
After groupby id
, for each month's actual and predicted values, I need to calculate current month's actual and predicted values with previous month's actual values, except: let's say for id=='S01'
, it's a quarterly data instead of monthly, so I'll need calculate difference values by current month's with previous quarter's last month's actual values, ie., in 2021-03's with 2020-12's instead of 2021-02, same logic for other months.
My attempt code:
df %>%
group_by(id) %>%
mutate(actual2=actual) %>%
fill(actual2) %>%
mutate(act_diff = case_when(
actual2 > lag(actual2) ~ actual2 - lag(actual2),
actual2 < lag(actual2) ~ actual2 - lag(actual2),
actual2 == lag(actual2) ~ 0),
pred_diff = case_when(
pred > lag(actual2) ~ pred - lag(actual2),
pred < lag(actual2) ~ pred - lag(actual2),
pred == lag(actual2) ~ 0),
act_diff = ifelse((id=='S01')&is.na(actual), NA, act_diff),
pred_diff = ifelse((id=='S01')&is.na(actual), NA, pred_diff), actual2=NULL) %>%
print(n=22)
Result:
id date actual pred act_diff pred_diff
<fct> <fct> <dbl> <dbl> <dbl> <dbl>
1 M01 2021-01 3.4 3.29 NA NA
2 M01 2021-02 5.4 5.82 2 2.42
3 M01 2021-03 7.4 6.71 2 1.31
4 M01 2021-04 7.4 6.05 0 -1.35
5 M01 2021-05 7.5 5.58 0.100 -1.82
6 M01 2021-06 8 7.01 0.5 -0.489
7 M01 2021-07 8.9 9.74 0.9 1.74
8 M02 2021-01 10.8 11.0 NA NA
9 M02 2021-02 10.1 6.52 -0.700 -4.28
10 M02 2021-03 8.2 9.69 -1.9 -0.411
11 M02 2021-04 10.1 10.4 1.9 2.20
12 M02 2021-05 9.4 9.40 -0.700 -0.701
13 M02 2021-06 10.1 9.76 0.700 0.365
14 M02 2021-07 9.4 9.86 -0.700 -0.245
15 S01 2020-12 -0.3 0.493 NA NA
16 S01 2021-01 NA 8.40 NA NA
17 S01 2021-02 NA 8.17 NA NA
18 S01 2021-03 8.6 8.57 8.9 8.87
19 S01 2021-04 NA 8.40 NA NA
20 S01 2021-05 NA 8.68 NA NA
21 S01 2021-06 8.3 8.72 -0.300 0.119
22 S01 2021-07 NA 8.26 NA NA
How could I modify the code above get an expected outcome like this? Thanks:
id date actual pred act_diff pred_diff
1 M01 2021-01 3.4 3.2888898 NA NA
2 M01 2021-02 5.4 5.8194077 2.0 2.4194077
3 M01 2021-03 7.4 6.7056084 2.0 1.3056084
4 M01 2021-04 7.4 6.0544573 0.0 -1.3455427
5 M01 2021-05 7.5 5.5824091 0.1 -1.8175909
6 M01 2021-06 8.0 7.0105247 0.5 -0.4894753
7 M01 2021-07 8.9 9.7429024 0.9 1.7429024
8 M02 2021-01 10.8 10.9857140 NA NA
9 M02 2021-02 10.1 6.5220037 -0.7 -4.2779963
10 M02 2021-03 8.2 9.6889772 -1.9 -0.4110228
11 M02 2021-04 10.1 10.3980146 1.9 2.1980146
12 M02 2021-05 9.4 9.3989916 -0.7 -0.7010084
13 M02 2021-06 10.1 9.7646169 0.7 0.3646169
14 M02 2021-07 9.4 9.8550335 -0.7 -0.2449665
15 S01 2020-12 -0.3 0.4933114 NA NA
16 S01 2021-01 NA 8.4037229 NA 8.7037229 # calculate with S01's actual value in 2020-12
17 S01 2021-02 NA 8.1748545 NA 8.4748545 # calculate with S01's actual value in 2020-12
18 S01 2021-03 8.6 8.5731179 8.9 8.8731179 # calculate with S01's actual value in 2020-12
19 S01 2021-04 NA 8.4030658 NA -0.1969342 # calculate with S01's actual value in 2021-03
20 S01 2021-05 NA 8.6842895 NA 0.0842895 # calculate with S01's actual value in 2021-03
21 S01 2021-06 8.3 8.7190792 -0.3 0.1190792 # calculate with S01's actual value in 2021-03
22 S01 2021-07 NA 8.2594395 NA -0.0405605 # calculate with S01's actual value in 2021-06