I have a huge database that sometimes has missing values that need to be replaced by the average between its preceding and following values. I don´t want to just input the last value if it is NA, but rather to do a simple interpolation using the average.
I have succeeded using two for loops and an if statement:
t2 <- c(0, 0, 0.02, 0.04, NA, NA)
t3 <- c(0, 0, NA, 0, -0.01, 0.03)
t4 <- c(0, -0.02, 0.01, 0, 0, -0.02)
df <- data.frame(t1,t2,t3,t4)
df.save<-df
for(i in 2:nrow(df)){
for(j in 2:ncol(df)){
if(i==1|j==1){
df[i,j]=df[i,j]
} else {
if(is.na(df[i,j])& !is.na(df[i-1,j-1])){
df[i,j]=mean(df[i,j-1],df[i,j+1])
}
}
}
}
df
I am sure this is not efficient at all and not even general - the way I wrote the code I have to start to run my search for NAs from the second rows and columns on. I think lapply could help e here, but I couldn´t achieve anything with that. Any ideas?
EDIT 1 Rui´s answer was perfect but when formulating my example I forgot to consider the case in which two NAs follow each other:
t2 <- c(0, 0, 0.02, 0.04, NA, NA)
t3 <- c(0, 0, NA, 0, -0.01, 0.03)
t4 <- c(0, -0.02, 0.01, 0, 0, -0.02)
df <- data.frame(t1,t2,t3,t4)
df.save<-df
for(i in 2:nrow(df)){
for(j in 2:ncol(df)){
if(i==1|j==1){
df[i,j]=df[i,j]
} else {
if(is.na(df[i,j])& !is.na(df[i-1,j-1])){
df[i,j]=mean(df[i,j-1],df[i,j+1])
}
}
}
}
df
In this case we get an error
Error in rowMeans(cbind(x[prev], x[nxt]), na.rm = TRUE) :
'x' must be numeric