impute median plus jitter

Question

I would like to efficiently impute missing values with a slightly different value in each cell.

for example:

df <- data_frame(x = rnorm(100), y = rnorm(100))
df[1:5,1] <- NA
df[1:5, 2] <- NA

df %<>% mutate_all(funs(ifelse(is.na(.), jitter(median(., na.rm = TRUE)), .)))

However, this imputes with the same number in all cells. How can I add a different noise to each cell? Of course, I could do this with a loop, but my data frame is huge and I would like to do this efficiently

Maybe use `rep(median(., na.rm=TRUE), length(someVariable))` or similar as your argument to `jitter`. — lmo, Apr 14 '19 at 12:47

score 0 · Accepted Answer · answered Apr 14 '19 at 13:14

0

We can use rep with n()

library(dplyr)
library(magrittr)
df %<>%
   mutate_all(list(~ case_when(is.na(.) ~ jitter(rep(median(., na.rm = TRUE), n())),
         TRUE ~ .)))

answered Apr 14 '19 at 13:14

akrun

874,273
37
540
662

impute median plus jitter

1 Answers1