R Missing Value Replacement Function

Question

I have a table with missing values and I'm trying to write a function that will replace the missing values with a calculation based on the nearest two non-zero values.

Example:

For X = 3, Tom = 5.1 + (7.4-5.1)/2.

For X = 4, Tom = (5.1 + (7.4-5.1)/2) + (7.4-5.1)/2

Does this function already exist? If not, any advice would be greatly appreciated.

I'm not aware of a function that does exactly that. The `na.locf` function in pkg::zoo would "carry forward". — IRTFM, Mar 29 '14 at 20:22
I don't quite understand what you meant in calculating the value of Tom for X = 4, could you elaborate? — xwang, Jul 22 '16 at 01:05

score 1 · Answer 1 · answered Mar 29 '14 at 20:56

1

A more usual way to do this (but not equivalent to the question) is to use linear interpolation:

library(zoo)
df <- data.frame(X = 1:5, Tom = c(4.3, 5.1, NA, NA, 7.4))

na.approx(df)

or spline interpolation:

na.spline(df)

answered Mar 29 '14 at 20:56

G. Grothendieck

254,981
17
203
341

Thanks! na.approx() did the trick. Do you know what the formula is that the algorithm uses, or how to look it up? – user3476463 Apr 02 '14 at 22:40
It uses linear interpolation. The R `approx` command does the real work underneath. `?approx` – G. Grothendieck Apr 02 '14 at 22:53

Steffen Moritz · Answer 2 · 2021-11-12T19:24:02.437

Actually the imputeTS package (I am the maintainer) offers a good solutions for this.

Replacement with the Moving Average

 na_ma(x, k = 2)

x is your input object k is the moving average window

k of 1 means you only consider the values before and after k of 2 means you consider the 2 values before and the 2 values after

This function is probably the closest to the required calculation. The only difference is, that the imputeTS method does not jump over NA values. (as required by the thread starter)

But especially for long NA streaks this makes perfectly sense. 1, 2, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 14, 15, 16 (taking the average of 2 and 14 for the NA at position 3 would be no good idea)

Furthermore Last Observation Forward (as mentioned by 42 in the comment)

imputeTS::na_locf(x)

or Interpolation (as also mentioned by G. Grothendieck)

imputeTS::na_interpolation(x)

are also missing data replacement options that go a little bit in the same direction. Here is a introduction to the imputeTS package in the R Journal if you are interested.

Robert Krzyzanowski · Answer 3 · 2014-03-29T20:49:34.353

Just use a loop in this scenario, other approaches are much harder.

for (i in seq_len(nrow(df)) {
  if (is.na(df[i, 'Tom']))
    df[i, 'Tom'] <- ((tmp <- c(0, df$Tom[!is.na(df$Tom)], 0))[i+1] + tmp[i]) / 2 + tmp[i]
}

Example

df <- data.frame(X = seq_len(100), Tom = ifelse(runif(100, 0, 1) > 0.5, NA, round(runif(100, 0, 10), 1)))
head(df)
# X Tom
# 1 1  NA
# 2 1.4
# 3 3  NA
# 4 4 3.9
# 5 5  NA
for (i in seq_len(nrow(df))) { if (is.na(df[i, 'Tom']))  df[i, 'Tom'] <- ((tmp <- c(0, df$Tom[!is.na(df$Tom)], 0))[i+1] + tmp[i]) / 2 + tmp[i] }
head(df)
#  X  Tom
# 1 1 0.70
# 2 2 1.40
# 3 3 4.05
# 4 4 3.90
# 5 5 9.05

R Missing Value Replacement Function

3 Answers3

Example

Linked