8

I want to create a new column that contains the average two other columns.
For example by original table (dat) looks like this:

    A   B
1   1   NaN
2   3   2
3   2   5
4   4   4
5   6   NaN
6   5   3

I now want a column C that averages A and B, so I tried the following

dat$C<-(dat$A + $dat$B)/2

But what I get is this

    A   B     C
1   1   NaN   NaN
2   3   2     2.5
3   2   5     3.5
4   4   4     4
5   6   NaN   NaN
6   5   3     4

When what I want is this

    A   B     C
1   1   NaN   1
2   3   2     2.5
3   2   5     3.5
4   4   4     4
5   6   NaN   6
6   5   3     4

So how can I calculate this new mean value column while working around the missing values in my dataset?

melanopygus
  • 131
  • 1
  • 1
  • 4
  • 4
    Try `df$C <- rowMeans(df, na.rm = TRUE)` where `df` is your `data.frame` – dickoa Jan 23 '14 at 22:31
  • @dickoa Thanks for the help. Unfortunately in my actual dataset I have other identifier columns that I'm not working into the mean so this doesn't work. – melanopygus Jan 23 '14 at 22:36
  • 4
    Just pass the data.frame subset to rowMeans : `dat$C <- rowMeans(dat[,c('A','B')], na.rm = TRUE)` – digEmAll Jan 23 '14 at 22:38

1 Answers1

4

You can also do

dat$C <- apply(dat,1,function(x) mean(na.omit(x)))

na.omit is useful to know if you want to make a more complex function since na.omit is from base R while na.rm is an argument for certain functions.

JeremyS
  • 3,497
  • 1
  • 17
  • 19