46

Very new to R and I have a .rda file that contains a matrix of gene IDs and counts for each ID in 96 columns. It looks like this:

enter image description here

I want to get separate counts for the number of non-zero items in each column. I've been trying the sum() function in a loop, but perhaps I don't understand loop syntax in R. Any help appreciated. Thanks!

Forest

Forest
  • 721
  • 1
  • 8
  • 14

4 Answers4

79

What about:

apply(your.matrix, 2, function(c)sum(c!=0))

Does this help?

edit:

Even better:

colSums(your.matrix != 0)

edit 2:

Here we go, with an example for ya:

> example = matrix(sample(c(0,0,0,100),size=70,replace=T),ncol=7)
> example
      [,1] [,2] [,3] [,4] [,5] [,6] [,7]
 [1,]    0  100    0    0  100    0  100
 [2,]  100    0    0    0    0    0  100
 [3,]    0    0    0    0    0    0  100
 [4,]    0  100    0    0    0    0    0
 [5,]    0    0  100  100    0    0    0
 [6,]    0    0    0  100    0    0    0
 [7,]    0  100  100    0    0    0    0
 [8,]  100    0    0    0    0    0    0
 [9,]  100  100    0    0  100    0    0
[10,]    0    0    0    0    0  100    0
> colSums(example != 0)
[1] 3 4 2 2 2 1 3

(new example, the previous example with '1' values was not suited to show that we are summing the number of cells, not their contents)

Jealie
  • 6,157
  • 2
  • 33
  • 36
  • Sort of...this looks like it's giving me the sum of all counts for each column. Is there a way to modify it so that I get the number of elements of each column that are non-zero? So, if there are 1000 rows per column and a given column has 72 non-zero rows, the count for that column is 72? Thanks. – Forest Mar 09 '14 at 19:34
  • I believe your are mistaken: this code gives you *exactly* what you want... I am adding an example to convince you :) – Jealie Mar 09 '14 at 19:35
  • 2
    This should work. `c!=0` is a vector of TRUE or FALSE, which gets coerced to 1 or 0 by `sum(...)`. So you are adding up 1's whenever c!=0, and that gives the count of non-zero elements. – jlhoward Mar 09 '14 at 19:38
  • I'm using this on a data frame containing all numeric values, but it's returning 'NA' for every column. I've tried a variation df = colSums(df > 0) to the same effect. Can you advise please? – Finger Picking Good Jul 09 '18 at 10:17
  • 1
    @FingerPickingGood you are likely having this error because you have NA values in every column. Try to add the argument `na.rm=TRUE`, for example: `colSums(df != 0, na.rm=T)` – Jealie Jul 10 '18 at 11:18
8

with x being a column or vector;

length(which(x != 0))

Ch3steR
  • 20,090
  • 4
  • 28
  • 58
Ayse Ozhan
  • 81
  • 1
  • 4
4

Another method using plyr's numcolwise:

library(plyr)

dat <- data.frame(a = sample(1:25, 25),
                  b = rep(0, 25),
                  c = sample(1:25, 25))
nonzero <- function(x) sum(x != 0)
numcolwise(nonzero)(dat)
   a b  c
1 25 0 25
maloneypatr
  • 3,562
  • 4
  • 23
  • 33
1

There is a way to count the number of columns that have zeros. This one uses dplyr.

First, data.frame operation mode needs to be rowwise() then, columns must be subset with c_across() which returns a vector, that can be used in any function that takes vectors. Finally the values are assigned to a new column using mutate().

library(dplyr)

df <- data.frame(a = sample(0:10, 100, replace = T),
                 b = sample(0:10, 100, replace = T),
                 c = sample(0:10, 100, replace = T))

df %>%
rowwise() %>%
mutate(`N_zeros` = sum(c_across(everything()) == 0))

This idea can also be modified for any other operation that would take all or a subset of columns for row-wise operation.

See documentation of c_across() for more details. Tested with dplyr version 1.0.6.

Viktor Horváth
  • 139
  • 2
  • 3