R Sum complete cases of two columns

Question

How can I sum the number of complete cases of two columns?

With c equal to:

      a  b
[1,] NA NA
[2,]  1  1
[3,]  1  1
[4,] NA  1

Applying something like

rollapply(c, 2, function(x) sum(complete.cases(x)),fill=NA)

I'd like to get back a single number, 2 in this case. This will be for a large data set with many columns, so I'd like to use rollapply across the whole set instead of simply doing sum(complete.cases(a,b)).

Am I over thinking it?

Thanks!

To which package does `rollapply` belong? And I do not see why `sum( complete.cases( c ) )` shouldn't be the best code for your problem. — Beasterfield, Jan 10 '14 at 09:25

Simon O'Hanlon · Answer 1 · 2014-01-10T09:41:43.463

Did you try sum(complete.cases(x))?!

set.seed(123)
x <- matrix( sample( c(NA,1:5) , 15 , TRUE ) , 5 )
#     [,1] [,2] [,3]
#[1,]    1   NA    5
#[2,]    4    3    2
#[3,]    2    5    4
#[4,]    5    3    3
#[5,]    5    2   NA


sum(complete.cases(x))
#[1] 3

To find the complete.cases() of the first two columns:

sum(complete.cases(x[,1:2]))
#[1] 4

And to apply to two columns of a matrix across the whole matrix you could do this:

#  Bigger data for example
set.seed(123)
x <- matrix( sample( c(NA,1:5) , 50 , TRUE ) , 5 )
#     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
#[1,]    1   NA    5    5    5    4    5    2   NA    NA
#[2,]    4    3    2    1    4    3    5    4    2     1
#[3,]    2    5    4   NA    3    3    4    1    2     2
#[4,]    5    3    3    1    5    1    4    1    2     1
#[5,]    5    2   NA    5    3   NA   NA    1   NA     5

# Column indices
id <- seq( 1 , ncol(x) , by = 2 )
[1] 1 3 5 7 9
apply( cbind(id,id+1) , 1 , function(i) sum(complete.cases(x[,c(i)])) )
[1] 4 3 4 4 3

complete.cases() works row-wise across the whole data.frame or matrix returning TRUE for those rows which are not missing any data. A minor aside, "c" is a bad variable name because c() is one of the most commonly used functions.

`x` appears to be a matrix in the original post, but that doesn't matter for the solution. — Roland, Jan 10 '14 at 09:27
Hi Simon, absolutely. As I stated I need to do this over a large matrix, two rows at a time each giving me back a single integer, but my problem is it gives back a matrix. — Travis Liew, Jan 10 '14 at 09:27
@Ubobo check the edit at the bottom. Just subset the matrix to the columns you want. — Simon O'Hanlon, Jan 10 '14 at 09:31
@Ubobo Do you want to count pairwise the complete cases? And do you mean pairs of columns or rows? May you should clarify your question. — Beasterfield, Jan 10 '14 at 09:33

Roland · Accepted Answer · 2014-01-10T09:56:35.573

1

You can calculate the number of complete cases in neighboring matrix columns using rollapply like this:

m <- matrix(c(NA,1,1,NA,1,1,1,1),ncol=4)
#     [,1] [,2] [,3] [,4]
#[1,]   NA    1    1    1
#[2,]    1   NA    1    1

library(zoo)

rowSums(rollapply(is.na(t(m)), 2, function(x) !any(x)))
#[1] 0 1 2

edited Jan 10 '14 at 09:56

answered Jan 10 '14 at 09:43

Roland

127,288
10
191
288

Zbynek · Answer 3 · 2014-01-10T09:41:29.067

0

This shoudl work for both matrix and data.frame

> sum(apply(c, 1, function(x)all(!is.na(x))))

[1] 2

and you could simply iterate through large matrix M

for (i in 1:(ncol(M)-1) ){
    c <- M[,c(i,i+1]
    agreement <- sum(apply(c, 1, function(x)all(!is.na(x))))
}

edited Jan 10 '14 at 09:41

answered Jan 10 '14 at 09:28

Zbynek

5,673
6
30
52

R Sum complete cases of two columns

3 Answers3