R: Interpolation multiple columns by group using target values column

Question

Have a 15x6 dataframe:

df <- data.frame(PART = c("A7","A7","A7","A7","A7","A1","A1","A1","A1","A1","A7","A7","A7","A7","A7"),
                 LIMIT = c(50,50,50,50,50,55,55,55,55,55,52.5,52.5,52.5,52.5,52.5),
                 MEAS = c(14.008,19.053,22.244,24.554,25.521,18.495,22.3,24.867,26.825,27.169,15.299,20.239,23.384,25.606,26.516),
                 MEAS_TARGET = c(16.5,16.5,16.5,16.5,16.5,21.2,21.2,21.2,21.2,21.2,21.5,21.5,21.5,21.5,21.5),
                 INT = c(1.5,2.5,3.5,4.5,5,2,3,4,5,5.2,1.5,2.5,3.5,4.5,5),
                 COL = c(-31.845,-25.51,-21.377,-18.537,-17.546,-41.1,-39.294,-36.813,-33.779,-33.361,-53.589,-49.664,-46.836,-43.581,-40.64))

I am trying to group by PART and LIMIT columns and using linear interpolation find the missing values in INT and COL columns when MEAS = MEAS_TARGET and create the following 18x6 dataframe result:

result <- data.frame(PART = c("A7","A7","A7","A7","A7","A1","A1","A1","A1","A1","A7","A7","A7","A7","A7","A7","A1","A7"),
                 LIMIT = c(50,50,50,50,50,55,55,55,55,55,52.5,52.5,52.5,52.5,52.5,50,55,52.5),
                 MEAS = c(14.008,19.053,22.244,24.554,25.521,18.495,22.3,24.867,26.825,27.169,15.299,20.239,23.384,25.606,26.516,16.5,21.2,21.5),
                 MEAS_TARGET = c(16.5,16.5,16.5,16.5,16.5,21.2,21.2,21.2,21.2,21.2,21.5,21.5,21.5,21.5,21.5,16.5,21.2,21.5),
                 INT = c(1.5,2.5,3.5,4.5,5,2,3,4,5,5.2,1.5,2.5,3.5,4.5,5,1.99,2.7,2.9),
                 COL = c(-31.845,-25.51,-21.377,-18.537,-17.546,-41.1,-39.294,-36.813,-33.779,-33.361,-53.589,-49.664,-46.836,-43.581,-40.64,-28.716,-39.816,-48.53))

I tried to create NA rows for each group and use this and this but couldn't make it work. Any advice on this would be greatly appreciated.

score 2 · Accepted Answer · answered Feb 26 '21 at 01:59

2

We can use complete to include new rows where MEAS = MEAS_TARGET and interpolate INT and COL columns with zoo::na.approx.

library(dplyr)
library(tidyr)

df %>%
  group_by(PART, LIMIT) %>%
  complete(MEAS = unique(c(MEAS, MEAS_TARGET))) %>%
  mutate(across(c(INT, COL), zoo::na.approx)) %>%
  fill(MEAS_TARGET) %>%
  ungroup

answered Feb 26 '21 at 01:59

Ronak Shah

377,200
20
156
213

thanks, that's the solution I was looking for. – scyrt Feb 26 '21 at 02:10
one question: when `MEAS_TARGET` is smaller than `MEAS` I get an error. for example replacing `16.5` with `13.5` in `MEAS_TARGET` column. is there a fix for that? – scyrt Feb 26 '21 at 22:50
1

You can set `na.rm = FALSE` in `na.approx`, since there are no values to interpolate it will remain `NA`. `mutate(across(c(INT, COL), zoo::na.approx, na.rm = FALSE)) %>%` – Ronak Shah Feb 27 '21 at 00:26

R: Interpolation multiple columns by group using target values column

1 Answers1