Assuming I have a panel data as follows, which was edited from this link:
df <- structure(list(id = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("M01",
"M02", "S01"), class = "factor"), date = structure(c(2L, 3L,
4L, 5L, 6L, 7L, 8L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 1L, 2L, 3L, 4L,
5L, 6L, 7L, 8L), .Label = c("2020-12", "2021-01", "2021-02",
"2021-03", "2021-04", "2021-05", "2021-06", "2021-07"), class = "factor"),
actual = c(3.4, 5.4, 7.4, 7.4, 7.5, 8, 8.9, 10.8, 10.1, 8.2,
10.1, 9.4, 10.1, 9.4, -0.3, NA, NA, 8.6, NA, NA, 8.3, NA),
pred = c(3.288889774, 5.819407687, 6.705608369, 6.054457292,
5.582409131, 7.01052472, 9.742902434, 10.98571396, 6.522003651,
9.688977242, 10.39801463, 9.398991615, 9.764616936, 9.855033457,
0.493311422, 8.403722942, 8.174854517, 8.573117852, 8.403065801,
8.684289455, 8.719079247, 8.259439468)), class = "data.frame", row.names = c(NA,
-22L))
After groupby id
, for each month's actual and predicted values, I need to check if current month's actual and predicted values with same directions comparing to previous month's actual values. The logic of creating new columns is: if the difference values of two months' are positive
, negative
or zero
s, then return increase
, decrease
and unchanged
respectively, if either or both of values are NA
s, then return NA
.
df %>%
# mutate(year = as.integer(year)) %>%
group_by(id) %>%
# arrange(date) %>%
mutate(act_direction = case_when(actual > lag(actual) ~ "increase",
actual < lag(actual) ~ "decrease",
actual == lag(actual) ~ "unchanged"),
pred_direction = case_when(pred > lag(actual) ~ "increase",
pred < lag(actual) ~ "decrease",
pred == lag(actual) ~ "unchanged"))
If all id
are monthly data, the code above works smoothly. But for this example, we will have exception: let's say for id=='S01'
, it's a quarterly data instead of monthly, so I'll need compare values ie., in 2021-03
's with 2020-12
's instead of 2021-02
, same logic for other months.
How could I modify the code to coordinate with this case? Thanks.
The expected outcome:
id date actual pred act_direction pred_direction
1 M01 2021-01 3.4 3.2888898
2 M01 2021-02 5.4 5.8194077 increase increase
3 M01 2021-03 7.4 6.7056084 increase increase
4 M01 2021-04 7.4 6.0544573 unchanged decrease
5 M01 2021-05 7.5 5.5824091 increase decrease
6 M01 2021-06 8.0 7.0105247 increase decrease
7 M01 2021-07 8.9 9.7429024 increase increase
8 M02 2021-01 10.8 10.9857140
9 M02 2021-02 10.1 6.5220037 decrease decrease
10 M02 2021-03 8.2 9.6889772 decrease decrease
11 M02 2021-04 10.1 10.3980146 increase increase
12 M02 2021-05 9.4 9.3989916 decrease decrease
13 M02 2021-06 10.1 9.7646169 increase increase
14 M02 2021-07 9.4 9.8550335 decrease decrease
15 S01 2020-12 -0.3 0.4933114
16 S01 2021-01 NA 8.4037229
17 S01 2021-02 NA 8.1748545
18 S01 2021-03 8.6 8.5731179 increase increase # compare with S01's actual value in 2020-12
19 S01 2021-04 NA 8.4030658
20 S01 2021-05 NA 8.6842895
21 S01 2021-06 8.3 8.7190792 decrease increase # compare with S01's actual value in 2021-03
22 S01 2021-07 NA 8.2594395