0

How can I sum the number of complete cases of two columns?

With c equal to:

      a  b
[1,] NA NA
[2,]  1  1
[3,]  1  1
[4,] NA  1

Applying something like

rollapply(c, 2, function(x) sum(complete.cases(x)),fill=NA)

I'd like to get back a single number, 2 in this case. This will be for a large data set with many columns, so I'd like to use rollapply across the whole set instead of simply doing sum(complete.cases(a,b)).

Am I over thinking it?

Thanks!

Travis Liew
  • 787
  • 1
  • 11
  • 34
  • 1
    To which package does `rollapply` belong? And I do not see why `sum( complete.cases( c ) )` shouldn't be the best code for your problem. – Beasterfield Jan 10 '14 at 09:25

3 Answers3

2

Did you try sum(complete.cases(x))?!

set.seed(123)
x <- matrix( sample( c(NA,1:5) , 15 , TRUE ) , 5 )
#     [,1] [,2] [,3]
#[1,]    1   NA    5
#[2,]    4    3    2
#[3,]    2    5    4
#[4,]    5    3    3
#[5,]    5    2   NA


sum(complete.cases(x))
#[1] 3

To find the complete.cases() of the first two columns:

sum(complete.cases(x[,1:2]))
#[1] 4

And to apply to two columns of a matrix across the whole matrix you could do this:

#  Bigger data for example
set.seed(123)
x <- matrix( sample( c(NA,1:5) , 50 , TRUE ) , 5 )
#     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
#[1,]    1   NA    5    5    5    4    5    2   NA    NA
#[2,]    4    3    2    1    4    3    5    4    2     1
#[3,]    2    5    4   NA    3    3    4    1    2     2
#[4,]    5    3    3    1    5    1    4    1    2     1
#[5,]    5    2   NA    5    3   NA   NA    1   NA     5

# Column indices
id <- seq( 1 , ncol(x) , by = 2 )
[1] 1 3 5 7 9
apply( cbind(id,id+1) , 1 , function(i) sum(complete.cases(x[,c(i)])) )
[1] 4 3 4 4 3

complete.cases() works row-wise across the whole data.frame or matrix returning TRUE for those rows which are not missing any data. A minor aside, "c" is a bad variable name because c() is one of the most commonly used functions.

Simon O'Hanlon
  • 58,647
  • 14
  • 142
  • 184
  • 2
    `x` appears to be a matrix in the original post, but that doesn't matter for the solution. – Roland Jan 10 '14 at 09:27
  • Hi Simon, absolutely. As I stated I need to do this over a large matrix, two rows at a time each giving me back a single integer, but my problem is it gives back a matrix. – Travis Liew Jan 10 '14 at 09:27
  • @Ubobo check the edit at the bottom. Just subset the matrix to the columns you want. – Simon O'Hanlon Jan 10 '14 at 09:31
  • @Ubobo Do you want to count pairwise the complete cases? And do you mean pairs of columns or rows? May you should clarify your question. – Beasterfield Jan 10 '14 at 09:33
1

You can calculate the number of complete cases in neighboring matrix columns using rollapply like this:

m <- matrix(c(NA,1,1,NA,1,1,1,1),ncol=4)
#     [,1] [,2] [,3] [,4]
#[1,]   NA    1    1    1
#[2,]    1   NA    1    1

library(zoo)

rowSums(rollapply(is.na(t(m)), 2, function(x) !any(x)))
#[1] 0 1 2
Roland
  • 127,288
  • 10
  • 191
  • 288
0

This shoudl work for both matrix and data.frame

> sum(apply(c, 1, function(x)all(!is.na(x))))

[1] 2

and you could simply iterate through large matrix M

for (i in 1:(ncol(M)-1) ){
    c <- M[,c(i,i+1]
    agreement <- sum(apply(c, 1, function(x)all(!is.na(x))))
}
Zbynek
  • 5,673
  • 6
  • 30
  • 52