0

i have a dataframe df with a column containing values (meter reading). Some values are sporadically missing (NA).

df excerpt:

row   time      meter_reading
1     03:10:00  26400
2     03:15:00  NA
3     03:20:00  27200
4     03:25:00  28000
5     03:30:00  NA
6     03:35:00  NA
7     03:40:00  30000

What I'm trying to do:

If there is only one consecutive NA, I want to interpolate (e.g. na.interpolation for row 2). But if there's two or more consecutive NA, I don't want R to interpolate and leave the values as NA. (e.g. row 5 and 6).

What I tried so far is loop (for...) with an if-condition. My approach:

library("imputeTS")
for(i in 1:(nrow(df))) {
  if(!is.na(df$meter_reading[i]) & is.na(df$meter_reading[i-1]) & !is.na(df$meter_reading[i-2])) {
    na_interpolation(df$meter_reading) 
    }
}

Giving me :

Error in if (!is.na(df$meter_reading[i]) & is.na(df$meter_reading[i -  : 
  argument is of length zero

Any ideas how to do it? Am I completely wrong here?

Thanks!

Steffen Moritz
  • 7,277
  • 11
  • 36
  • 55
Peha
  • 33
  • 3

3 Answers3

1

I don't knaow what is your na.interpolation, but taking the mean of previous and next rows for example, you could do that with dplyr :

df %>% mutate(x=ifelse(is.na(meter_reading),
                       (lag(meter_reading)+lead(meter_reading))/2,
                       meter_reading))
#  row     time meter_reading     x
#1   1 03:10:00         26400 26400
#2   2 03:15:00            NA 26800
#3   3 03:20:00         27200 27200
#4   4 03:25:00         28000 28000
#5   5 03:30:00            NA    NA
#6   6 03:35:00            NA    NA
#7   7 03:40:00         30000 30000
Nicolas2
  • 2,170
  • 1
  • 6
  • 15
0

A quick look shows that your counter i starts at 1 and then you try to get index at i-1 andi-2.

user2974951
  • 9,535
  • 1
  • 17
  • 24
0

Just an addition here, in the current imputeTS package version, there is also a maxgap option for each imputation algorithm, which easily solves this problem. Probably wasn't there yet, as you asked this question.

Your code would look like this:

library("imputeTS")
na_interpolation(df, maxgap = 1)

This means gaps of 1 NA get imputed, while longer gaps of consecutive NAs remain NA.

Steffen Moritz
  • 7,277
  • 11
  • 36
  • 55