0

I am trying to impute the dataframe with Hmisc impute model. I am able to impute the data for one column at a time but fail to loop over columns.

Below example - works fine but I would like to make it dynamic using a function:

impute_marks$col1 <- with(impute_marks, round(impute(col1, mean)),0)

Example:

impute_dataframe <- function()
{
  for(i in 1:ncol(impute_marks))
  {
    impute_marks[is.na(impute_marks[,i]), i] <- with(impute_marks, round(impute(impute_marks[,i], mean)),0)
  }
}
impute_dataframe 

There is no error when I run the function but there is no imputed data as well to the dataset impute_marks.

Murlidhar Fichadia
  • 2,589
  • 6
  • 43
  • 93

2 Answers2

1

Hmisc::impute is already a function, why not just use apply and save a for loop?:

library(Hmisc)
age1 <- c(1,2,NA,4)
age2 <- c(NA, 4, 3, 1)
mydf <- data.frame(age1, age2)

mydf
  age1 age2
1    1   NA
2    2    4
3   NA    3
4    4    1

apply(mydf, 2, function(x) {round(impute(x, mean))})
  age1 age2
1    1    3
2    2    4
3    2    3
4    4    1

EDIT: To keep mydf as a data.frame you could coherce it back like this:

mydf <- as.data.frame(mydf)

But what I'd do is use another package purrr which is nice set of tools around this apply/mapping idea. map_df for example will always return a data.frame object, there are a bunch of map_x that you can see with ?map

library(purrr)
map_df(mydf, ~ round(impute(., mean)))

I know it is preferred to use the base R functions, but purrr makes apply style operations so much easier.

Nate
  • 10,361
  • 3
  • 33
  • 40
  • 1
    I am fairly new to R. I wasnt aware of apply function. Thanks for pointing it out. I was able to achieve what I wanted but I get a matrix rather than a data frame at the end of the operation. How can I convert it to data frame after imputing. this is what i got: num[1:153, 1:26] 55 68 .... all the values. rather than 153 obs. 26 variables – Murlidhar Fichadia Jan 22 '17 at 14:13
  • I got it resolved by wrapping whole right hand side using as.data.frame() – Murlidhar Fichadia Jan 22 '17 at 14:18
  • if you find yourself doing this a lot, check out `library(purrr)` it has nice syntax and I think it is more intuitive than the base `apply`, `lapply` , etc functions – Nate Jan 22 '17 at 14:21
1

We can use na.aggregate from zoo which can be applied directly on the dataset

library(zoo)
round(na.aggregate(mydf))
#  age1 age2
#1    1    3
#2    2    4
#3    2    3
#4    4    1

or in each column separately with lapply

mydf[] <- lapply(mydf, function(x) round(na.aggregate(x)))

By default, na.aggregate gives the mean. But, we can change the FUN

akrun
  • 874,273
  • 37
  • 540
  • 662