I would like to perform a linear interpolation in a variable of a data frame which takes into account the: 1) time difference between the two points, 2) the moment when the data was taken and 3) the individual taken for measure the variable.
For example in the next dataframe:
df <- data.frame(time=c(1,2,3,4,5,6,7,1,2,3),
Individuals=c(1,1,1,1,1,1,1,2,2,2),
Value=c(1, 2, 3, NA, 5, NA, 7, 5, NA, 7))
df
I would like to obtain:
result <- data.frame(time=c(1,2,3,4,5,6,7,1,2,3),
Individuals=c(1,1,1,1,1,1,1,2,2,2),
Value=c(1, 2, 3, 4, 5, 6, 7, 5, 5.5, 6))
result
I cannot use exclusively the function na.approx
of the package zoo
because all observations are not consecutives, some observations belong to one individual and other observations belong to other ones. The reason is because if the second individual would have its first obsrevation with NA
and I would use exclusively the function na.approx
, I would be using information from the individual==1
to interpolate the NA
of the individual==2
(e.g the next data frame would have sucherror)
df_2 <- data.frame(time=c(1,2,3,4,5,6,7,1,2,3),
Individuals=c(1,1,1,1,1,1,1,2,2,2),
Value=c(1, 2, 3, NA, 5, NA, 7, NA, 5, 7))
df_2
I have tried using the packages zoo
and dplyr
:
library(dplyr)
library(zoo)
proof <- df %>%
group_by(Individuals) %>%
na.approx(df$Value)
But I cannot perform group_by
in a zoo
object.
Do you know how to interpolate NA
values in one variable by groups?
Thanks in advance,