10

I have a data frame with NAs and I want to replace the NAs with row means

c1 = c(1,2,3,NA)
c2 = c(3,1,NA,3)
c3 = c(2,1,3,1)

df = data.frame(c1,c2,c3)

> df
  c1 c2 c3
1  1  3  2
2  2  1  1
3  3 NA  3
4 NA  3  1

so that

> df
  c1 c2 c3
1  1  3  2
2  2  1  1
3  3  3  3
4  2  3  1
Brian
  • 349
  • 4
  • 16

5 Answers5

11

Very similar to @baptiste's answer

> ind <- which(is.na(df), arr.ind=TRUE)
> df[ind] <- rowMeans(df,  na.rm = TRUE)[ind[,1]]
Jilber Urbina
  • 58,147
  • 10
  • 114
  • 138
  • I found if I have whole rows of NAs, an error occurs. Is it proper etiquette to pose this as a whole new question? – Brian Jul 24 '13 at 14:39
4

I think this works,

df[which(is.na(df), arr.ind=TRUE)] <- rowMeans(df[!complete.cases(df), ], na.rm=TRUE)
baptiste
  • 75,767
  • 19
  • 198
  • 294
3

Using apply (note the returned object is a matrix):

t( apply( df , 1 , function(x) { x[ is.na(x) ] = mean( x , na.rm = TRUE ); x } ) )
     c1 c2 c3
[1,]  1  3  2
[2,]  2  1  1
[3,]  3  3  3
[4,]  2  3  1

We use any anonymous function to change the values of each NA in each row to the mean of that row. The only advantage is that you don't have to do any more typing if the number of rows increases. It is not particularly efficient or fast in a computational sense, but more so in a cognitive sense (you won't notice unless you have 000,000's of rows).

Simon O'Hanlon
  • 58,647
  • 14
  • 142
  • 184
2

My solution is

rwmns = rowMeans(df,na.rm=TRUE)
df$c1[is.na(df$c1)] = rwmns[is.na(df$c1)]
df$c2[is.na(df$c2)] = rwmns[is.na(df$c2)]
df$c3[is.na(df$c3)] = rwmns[is.na(df$c3)]
> df
  c1 c2 c3
1  1  3  2
2  2  1  1
3  3  3  3
4  2  3  1

Is there a more elegant way, especially when someone has many columns?

Brian
  • 349
  • 4
  • 16
  • 4
    Great work coming up with your own solution. You can use `[[` to index instead, so each line becomes `df[[col_name]][is.na(df[[col_name]])] <- rwmns[is.na(df[[col_name]])`. That way, you can loop or use an apply family over the column names you want to perform replacement on. – Justin Jul 23 '13 at 14:12
2

Another option is na.aggregate from library(zoo) after transposing the dataset

library(zoo)
df[] <- t(na.aggregate(t(df)))
df
#  c1 c2 c3
#1  1  3  2
#2  2  1  1
#3  3  3  3
#4  2  3  1
akrun
  • 874,273
  • 37
  • 540
  • 662