2

Given a df as follows, for each year's actual and predicted values, I need to check if current year's actual and predicted values with same directions comparing to previous year's actual values:

df <- structure(list(code = c("M0000273", "M0000357", "M0000545"), 
    name = c("industry", "agriculture", "service"), `2019_actual` = c(16.78, 
    9.26, 49.38), `2019_pred` = c(17.78, 10.26, NA), `2020_actual` = c(35.74, 
    NA, 49.38), `2020_pred` = c(36.74, 66.56, 25.36), `2021_actual` = c(30.74, 
    83.42, 63.26), `2021_pred` = c(31.74, 84.42, 35.23)), class = "data.frame", row.names = c(NA, 
-3L))

Out:

      code        name 2019_actual 2019_pred 2020_actual 2020_pred 2021_actual 2021_pred
1 M0000273    industry       16.78     17.78       35.74     36.74       30.74     31.74
2 M0000357 agriculture        9.26     10.26          NA     66.56       83.42     84.42
3 M0000545     service       49.38        NA       49.38     25.36       63.26     35.23

The logic is: if the difference values of two years are positive, negative or zeros, then return increase, decrease and unchanged respectively, if either or both of values are NAs, then return NA.

The expected result:

     code        name 2019_actual 2019_pred 2020_actual 2020_pred 2021_actual 2021_pred 2020_act_direction 2020_pred_direction 2021_act_direction
1 M0000273    industry       16.78     17.78       35.74     36.74       30.74     31.74           increase            increase           decrease
2 M0000357 agriculture        9.26     10.26          NA     66.56       83.42     84.42                               increase                   
3 M0000545     service       49.38        NA       49.38     25.36       63.26     35.23          unchanged            decrease           increase
  2021_pred_direction
1            decrease
2                    
3            decrease

I try with the following code, but it has two issues: 1. it return an error: **Error: unexpected ')' in " )"**; 2. if I have many years (ie., from 2010 to 2020), it's obviously not a right way to get the expected result.

df %>% 
  mutate(
    `2020_act_direction` = case_when(`2020_actual` - `2019_actual` > 0 ~ 'increase',
                                   `2020_actual` - `2019_actual` < 0 ~ 'decrease',
                                   `2020_actual` - `2019_actual` == 0 ~ 'unchanged',
                                   TRUE ~ NA
                                   ),
    `2020_pred_direction` = case_when(`2020_pred` - `2019_actual` > 0 ~ 'increase',
                                   `2020_pred` - `2019_actual` < 0 ~ 'decrease',
                                   `2020_pred` - `2019_actual` == 0 ~ 'unchanged',
                                   TRUE ~ NA
    )
    `2021_act_direction` = case_when(`2021_actual` - `2020_actual` > 0 ~ 'increase',
                                   `2021_actual` - `2020_actual` < 0 ~ 'decrease',
                                   `2021_actual` - `2020_actual` == 0 ~ 'unchanged',
                                   TRUE ~ NA
    )
    `2021_pred_direction` = case_when(`2021_pred` - `2020_actual` > 0 ~ 'increase',
                                   `2021_pred` - `2020_actual` < 0 ~ 'decrease',
                                   `2021_pred` - `2020_actual` == 0 ~ 'unchanged',
                                   TRUE ~ NA
    )
  )

How can I deal with this issue?

Nimantha
  • 6,405
  • 6
  • 28
  • 69
ah bon
  • 9,293
  • 12
  • 65
  • 148

1 Answers1

1

Use pivot_longer and pivot_wider to get one row per year/code/name. Then you can easily use lag to compare between consecutive years.

library(tidyverse)

df <- structure(list(code = c("M0000273", "M0000357", "M0000545"), 
                     name = c("industry", "agriculture", "service"), 
                     `2019_actual` = c(16.78, 9.26, 49.38), 
                     `2019_pred` = c(17.78, 10.26, NA), 
                     `2020_actual` = c(35.74, NA, 49.38), 
                     `2020_pred` = c(36.74, 66.56, 25.36), 
                     `2021_actual` = c(30.74, 83.42, 63.26), 
                     `2021_pred` = c(31.74, 84.42, 35.23)), 
                class = "data.frame", row.names = c(NA, -3L)) %>% 
    as_tibble()

df %>% 
    pivot_longer(cols = c(-code, -name), names_to = c("year", "type"), names_sep = "_") %>% 
    pivot_wider(names_from = "type", values_from = "value") %>% 
    mutate(year = as.integer(year)) %>% 
    group_by(code, name) %>% 
    arrange(year) %>% 
    mutate(act_direction = case_when(actual > lag(actual) ~ "increase",
                                     actual < lag(actual) ~ "decrease",
                                     actual == lag(actual) ~ "unchanged"),
           pred_direction = case_when(pred > lag(actual) ~ "increase",
                                   pred < lag(actual) ~ "decrease",
                                   pred == lag(actual) ~ "unchanged"))
Michael Dewar
  • 2,553
  • 1
  • 6
  • 22