How to get rid of NA when computing the average?

Question

b=c(1,4,3,NA)
c=c(NA,4,3,8)
res=(b+c)/2
NA 4 3 NA

You can see that whenever we have NA the returned result is NA. I want to compute the average between b and c if both have values but if either b or c has NA then just return the value of b or c. The desired results would be:

res
1 4 3 8

first of all, use the function `rowMeans`, second use `na.rm=T`. — MichaelChirico, Aug 25 '15 at 12:11

akrun · Accepted Answer · 2015-08-25T13:53:52.500

We can use rowMeans after cbinding the vectors 'b', 'c' to create a matrix. rowMeans have options (na.rm = TRUE) to handle NA values.

rowMeans(cbind(b,c), na.rm=TRUE)

Or colMeans after rbinding the vectors.

colMeans(rbind(b,c), na.rm=TRUE)

Suppose if we have matrices instead of vectors, we can still do the rowMeans/colMeans after looping through the columns/rows of one of the dataset (assuming that they are of the same dimension). For example,

b <- matrix(c(1,4,3, NA, 2, 3, NA, 2), ncol=2)
c <- matrix(c(NA, 4, 3, 8, 1, NA, 3, 4), ncol=2)

We loop though the column sequence (seq_len(ncol(b))) with sapply, cbind the corresponding columns of 'b' and 'c' and get the rowMeans. The output will be matrix of the same dimension of the initial matrices.

m1 <- sapply(seq_len(ncol(b)), function(i)
             rowMeans(cbind(b[,i], c[,i]), na.rm=TRUE))
m1
#   [,1] [,2]
#[1,]    1  1.5
#[2,]    4  3.0
#[3,]    3  3.0
#[4,]    8  3.0

Another option instead of looping would be to replace the NA elements in both datasets with 0. We can use replace for that, do the + and divide based on the count of NA elements for each position.

m2 <- (replace(b, which(is.na(b)), 0) + replace(c, which(is.na(c)), 0))
m2/(2-(is.na(b)+is.na(c)))
#      [,1] [,2]
#[1,]    1  1.5
#[2,]    4  3.0
#[3,]    3  3.0
#[4,]    8  3.0

The above code can be made more compact by using NAer from library(qdap)

library(qdap)
(NAer(b) + NAer(c))/(2-(is.na(b)+is.na(c)))
#  1   2
#1 1 1.5
#2 4 3.0
#3 3 3.0
#4 8 3.0

Thanks @akrun what if `b` is a matrix and `c` is a matrix. How we can apply your solution?. In fact my real data are matrix but I gave a simple example. — temor, Aug 25 '15 at 12:54
@temor It depends upon the dimension of `b`. Can you be a bit more specific? Suppose if the `nrow(b)` is the same as the `length` of `c`. Then we can `cbind` both and get the `rowMeans`. If the `ncol(b)` is te same as `length` of `c`, we `rbind` it. — akrun, Aug 25 '15 at 12:56
the dimension of b is similar to that of c. dim of b is 1000 col * 500 row and dim of c is 1000 col * 500 row. — temor, Aug 25 '15 at 12:58
@temor Try `sapply(seq_len(ncol(b)), function(i) rowMeans(cbind(b[,i], c[,i]), na.rm=TRUE))` — akrun, Aug 25 '15 at 13:03

How to get rid of NA when computing the average?

1 Answers1