I'm struggling with the right syntax for multiple successive operations across
columns in dplyr
. In this data:
df <- structure(list(A1 = c(838.611, 824.048, 668.901, 225.075, 0,
0, 341.291, 0, 101.652, 127.341, 0, 297.092, 0, 0, 0, 0, 0, 764.737,
759.51, 772.21), A2 = c(499.041, 492.997, 486.132, 469.503, 476.782,
464.18, 469.833, 462.317, 455.507, 441.47, 490.147, 430.844,
0, 0, 0, 0, 0, 0, 0, 124.068)), row.names = c(NA, 20L), class = "data.frame")
say, I want to implement the following changes across columns A1
and A2
:
-
- replace
0
withNA
- replace
-
- set outliers to
NA
- set outliers to
-
- interpolate
NA
- interpolate
Using the following syntax only performs change 1. but not 2. and 3.:
library(dplyr)
library(zoo)
df %>%
mutate(across(starts_with("A"),
~na_if(.,0),
~ifelse(. %in% boxplot(.)$out, NA, .),
~na.approx(., na.rm = FALSE, rule = 2)))
A1 A2
1 838.611 499.041
2 824.048 492.997
3 668.901 486.132
4 225.075 469.503
5 NA 476.782
6 NA 464.180
7 341.291 469.833
8 NA 462.317
9 101.652 455.507
10 127.341 441.470
11 NA 490.147
12 297.092 430.844
13 NA NA
14 NA NA
15 NA NA
16 NA NA
17 NA NA
18 764.737 NA
19 759.510 NA
20 772.210 124.068
EDIT: The correct output is obtained from this (repetitive) type of code (which I'd like to avoid):
df %>%
mutate(across(starts_with("A"),
~na_if(.,0))) %>%
mutate(across(starts_with("A"),
~ifelse(. %in% boxplot(.)$out, NA, .))) %>%
mutate(across(starts_with("A"),
~na.approx(., na.rm = FALSE, rule = 2)))
A1 A2
1 838.6110 499.041
2 824.0480 492.997
3 668.9010 486.132
4 225.0750 469.503
5 263.8137 476.782
6 302.5523 464.180
7 341.2910 469.833
8 221.4715 462.317
9 101.6520 455.507
10 127.3410 441.470
11 212.2165 490.147
12 297.0920 430.844
13 375.0328 430.844
14 452.9737 430.844
15 530.9145 430.844
16 608.8553 430.844
17 686.7962 430.844
18 764.7370 430.844
19 759.5100 430.844
20 772.2100 430.844