0

A data frame contains 123 columns, and each columns have at least 1 NA value.

I want these NA values to be raplaced into column median. because there are so many columns, i cannot write a code using each column name.

so i tried to use 'apply' to solve this but it didn't work.

data2[-1]<-lapply(data2[-1],function(x)x - median(x,na.rm=TRUE))

it says it doesn't work since it is data frame, not numeric.

Equinox
  • 23
  • 5
  • Please provide example data as plain text, not images, so users can copy/paste. You can paste the output from `dput(data2)` or if large, `dput(head(data2))`. – neilfws Jun 05 '17 at 05:41
  • 1
    **This is not a duplicate, so don't VtC.** @RonakShah: the question and answer you cite is about matrices not dataframes, although the asker had meant to ask about about dataframes, but they didn't. I retitled it *"Replacing NA's in each column of matrix with the median of that column"*. So we do actually do need a canonical example for a dataframe. Might as well use this question. – smci Jun 05 '17 at 06:05

3 Answers3

1

We can use na.aggregate

 library(zoo)
 j1 <- sapply(df1, is.numeric)
 df1[j1] <- na.aggregate(df1[j1], FUN = median) 
akrun
  • 874,273
  • 37
  • 540
  • 662
1

We can use map2_df

library(purrr)
df <- data.frame(a = c(1, 2, 3), b = c(2, NA, 9), c = c(NA, 3, 5), d = c(0, 4, NA))
purrr::map2_df(df, purrr::dmap(df, median, na.rm = TRUE), function(x, y) ifelse(is.na(x), y, x))
0
for(i in 1:ncol(df)){
  df[is.na(df[,i]), i] <- median(df[,i], na.rm = TRUE)
}
AK88
  • 2,946
  • 2
  • 12
  • 31