Find and replace missing values with row mean

Question

I have a data frame with NAs and I want to replace the NAs with row means

c1 = c(1,2,3,NA)
c2 = c(3,1,NA,3)
c3 = c(2,1,3,1)

df = data.frame(c1,c2,c3)

> df
  c1 c2 c3
1  1  3  2
2  2  1  1
3  3 NA  3
4 NA  3  1

so that

score 11 · Accepted Answer · answered Jul 23 '13 at 14:23

11

Very similar to @baptiste's answer

> ind <- which(is.na(df), arr.ind=TRUE)
> df[ind] <- rowMeans(df,  na.rm = TRUE)[ind[,1]]

answered Jul 23 '13 at 14:23

Jilber Urbina

58,147
10
114
138

I found if I have whole rows of NAs, an error occurs. Is it proper etiquette to pose this as a whole new question? – Brian Jul 24 '13 at 14:39

score 4 · Answer 2 · answered Jul 23 '13 at 14:20

4

I think this works,

df[which(is.na(df), arr.ind=TRUE)] <- rowMeans(df[!complete.cases(df), ], na.rm=TRUE)

answered Jul 23 '13 at 14:20

baptiste

75,767
19
198
294

it's a bit redundant to use both is.na and complete.cases ; there's probably more a more efficient way in two lines – baptiste Jul 23 '13 at 14:22
1

Like this perhaps? `idx <- which(is.na(df), arr.ind=TRUE); df[ idx ] <- rowMeans( df[ idx[,1] , ], na.rm=TRUE)` – Simon O'Hanlon Jul 23 '13 at 14:25
@SimonO101 Jiber beat us to it – baptiste Jul 23 '13 at 14:28
Ha ha, yeah I just saw! – Simon O'Hanlon Jul 23 '13 at 14:28
I'm leaving this one for the sake of variety; also, the `complete.cases` part might be useful in a different situation – baptiste Jul 23 '13 at 14:30

score 3 · Answer 3 · answered Jul 23 '13 at 14:21

Using apply (note the returned object is a matrix):

t( apply( df , 1 , function(x) { x[ is.na(x) ] = mean( x , na.rm = TRUE ); x } ) )
     c1 c2 c3
[1,]  1  3  2
[2,]  2  1  1
[3,]  3  3  3
[4,]  2  3  1

We use any anonymous function to change the values of each NA in each row to the mean of that row. The only advantage is that you don't have to do any more typing if the number of rows increases. It is not particularly efficient or fast in a computational sense, but more so in a cognitive sense (you won't notice unless you have 000,000's of rows).

score 2 · Answer 4 · answered Jul 23 '13 at 14:10

2

My solution is

rwmns = rowMeans(df,na.rm=TRUE)
df$c1[is.na(df$c1)] = rwmns[is.na(df$c1)]
df$c2[is.na(df$c2)] = rwmns[is.na(df$c2)]
df$c3[is.na(df$c3)] = rwmns[is.na(df$c3)]
> df
  c1 c2 c3
1  1  3  2
2  2  1  1
3  3  3  3
4  2  3  1

Is there a more elegant way, especially when someone has many columns?

answered Jul 23 '13 at 14:10

Brian

349
4
16

4

Great work coming up with your own solution. You can use `[[` to index instead, so each line becomes `df[[col_name]][is.na(df[[col_name]])] <- rwmns[is.na(df[[col_name]])`. That way, you can loop or use an apply family over the column names you want to perform replacement on. – Justin Jul 23 '13 at 14:12

score 2 · Answer 5 · answered Nov 11 '15 at 05:01

2

Another option is na.aggregate from library(zoo) after transposing the dataset

library(zoo)
df[] <- t(na.aggregate(t(df)))
df
#  c1 c2 c3
#1  1  3  2
#2  2  1  1
#3  3  3  3
#4  2  3  1

answered Nov 11 '15 at 05:01

akrun

874,273
37
540
662

Find and replace missing values with row mean

5 Answers5

Linked

Related