0

I have a huge database that sometimes has missing values that need to be replaced by the average between its preceding and following values. I don´t want to just input the last value if it is NA, but rather to do a simple interpolation using the average.

I have succeeded using two for loops and an if statement:

t2 <- c(0, 0, 0.02, 0.04, NA, NA)
t3 <- c(0, 0, NA, 0, -0.01, 0.03)
t4 <- c(0, -0.02, 0.01, 0, 0, -0.02)
df <- data.frame(t1,t2,t3,t4)

df.save<-df

for(i in 2:nrow(df)){
  for(j in 2:ncol(df)){
    if(i==1|j==1){
      df[i,j]=df[i,j]
    } else {
    if(is.na(df[i,j])& !is.na(df[i-1,j-1])){
      df[i,j]=mean(df[i,j-1],df[i,j+1])
  }
  }
  }
}

df

I am sure this is not efficient at all and not even general - the way I wrote the code I have to start to run my search for NAs from the second rows and columns on. I think lapply could help e here, but I couldn´t achieve anything with that. Any ideas?

EDIT 1 Rui´s answer was perfect but when formulating my example I forgot to consider the case in which two NAs follow each other:

t2 <- c(0, 0, 0.02, 0.04, NA, NA)
t3 <- c(0, 0, NA, 0, -0.01, 0.03)
t4 <- c(0, -0.02, 0.01, 0, 0, -0.02)
df <- data.frame(t1,t2,t3,t4)

df.save<-df

for(i in 2:nrow(df)){
  for(j in 2:ncol(df)){
    if(i==1|j==1){
      df[i,j]=df[i,j]
    } else {
    if(is.na(df[i,j])& !is.na(df[i-1,j-1])){
      df[i,j]=mean(df[i,j-1],df[i,j+1])
  }
  }
  }
}

df

In this case we get an error

Error in rowMeans(cbind(x[prev], x[nxt]), na.rm = TRUE) : 
  'x' must be numeric
Laura K
  • 207
  • 1
  • 9

2 Answers2

1

The following function does what the question asks for.

meanNA <- function(x){
  na <- is.na(x)
  prev <- c(na[-1], FALSE)
  nxt  <- c(FALSE, na[-length(x)])
  x[na] <- rowMeans(cbind(x[prev], x[nxt]), na.rm = TRUE)
  is.na(x) <- is.nan(x)
  x
}

df[] <- lapply(df, meanNA)

df
#    t2    t3    t4
#1 0.00  0.00  0.00
#2 0.00  0.00 -0.02
#3 0.02  0.00  0.01
#4 0.04  0.00  0.00
#5 0.04 -0.01  0.00
#6   NA  0.03 -0.02
Rui Barradas
  • 70,273
  • 8
  • 34
  • 66
  • That is great, Rui. Thank you very much. – Laura K Jul 19 '19 at 19:11
  • One question: how to avoid an error if you have two NAs following each other? I have added to the example above. – Laura K Jul 19 '19 at 19:30
  • @LauraK The other example is exactly the same as the first one. Anyway, the error message is about `x` not being numeric. Run `sapply(df, class)` and tell me the result, maybe a vector is being created as character or factor. – Rui Barradas Jul 19 '19 at 20:35
0

Using this answer as an example:

df <- t(df.save)
for(i in 2:ncol(df)){
  idx <- which(is.na(df[,i]))
  idx <- idx[which(idx != 1)]
  if(length(idx) > 0){
    df[idx, i] <- sapply(idx, function(x) mean(df[x-1,i], df[x+1, i]))
  }
}
Andrew
  • 93
  • 8