Dealing with NaN when calculating means

Question

I want to create a new column that contains the average two other columns.
For example by original table (dat) looks like this:

    A   B
1   1   NaN
2   3   2
3   2   5
4   4   4
5   6   NaN
6   5   3

I now want a column C that averages A and B, so I tried the following

dat$C<-(dat$A + $dat$B)/2

But what I get is this

    A   B     C
1   1   NaN   NaN
2   3   2     2.5
3   2   5     3.5
4   4   4     4
5   6   NaN   NaN
6   5   3     4

When what I want is this

    A   B     C
1   1   NaN   1
2   3   2     2.5
3   2   5     3.5
4   4   4     4
5   6   NaN   6
6   5   3     4

So how can I calculate this new mean value column while working around the missing values in my dataset?

Try `df$C <- rowMeans(df, na.rm = TRUE)` where `df` is your `data.frame` — dickoa, Jan 23 '14 at 22:31
@dickoa Thanks for the help. Unfortunately in my actual dataset I have other identifier columns that I'm not working into the mean so this doesn't work. — melanopygus, Jan 23 '14 at 22:36
Just pass the data.frame subset to rowMeans : `dat$C <- rowMeans(dat[,c('A','B')], na.rm = TRUE)` — digEmAll, Jan 23 '14 at 22:38

score 4 · Answer 1 · answered Jan 24 '14 at 00:59

4

You can also do

dat$C <- apply(dat,1,function(x) mean(na.omit(x)))

na.omit is useful to know if you want to make a more complex function since na.omit is from base R while na.rm is an argument for certain functions.

answered Jan 24 '14 at 00:59

JeremyS

3,497
1
17
19

Dealing with NaN when calculating means

1 Answers1