23

Hi am using a matrix of gene expression, frag counts to calculate differentially expressed genes. I would like to know how to remove the rows which have values as 0. Then my data set will be compact and less spurious results will be given for the downstream analysis I do using this matrix.

Input

gene    ZPT.1   ZPT.0   ZPT.2   ZPT.3   PDGT.1  PDGT.0
XLOC_000001 3516    626 1277    770 4309    9030
XLOC_000002 342 82  185 72  835 1095
XLOC_000003 2000    361 867 438 454 687
XLOC_000004 143 30  67  37  90  236
XLOC_000005 0   0   0   0   0   0
XLOC_000006 0   0   0   0   0   0
XLOC_000007 0   0   0   0   1   3
XLOC_000008 0   0   0   0   0   0
XLOC_000009 0   0   0   0   0   0
XLOC_000010 7   1   5   3   0   1
XLOC_000011 63  10  19  15  92  228

Desired output

gene    ZPT.1   ZPT.0   ZPT.2   ZPT.3   PDGT.1  PDGT.0
XLOC_000001 3516    626 1277    770 4309    9030
XLOC_000002 342 82  185 72  835 1095
XLOC_000003 2000    361 867 438 454 687
XLOC_000004 143 30  67  37  90  236
XLOC_000007 0   0   0   0   1   3
XLOC_000010 7   1   5   3   0   1
XLOC_000011 63  10  19  15  92  228

As of now I only want to remove those rows where all the frag count columns are 0 if in any row some values are 0 and others are non zero I would like to keep that row intact as you can see my example above.

Please let me know how to do this.

Arun
  • 116,683
  • 26
  • 284
  • 387
ivivek_ngs
  • 917
  • 3
  • 10
  • 28
  • 26
    `df[rowSums(df[, -1])>0, ]` – Arun Aug 05 '13 at 10:22
  • 1
    @Arun a minor nit: the OP didn't specify whether he's got an array of integers or floats, so to be careful, you might want to check that `rowSums` is greater than 1e-10 or something. – Carl Witthoft Aug 05 '13 at 13:11
  • 2
    @CarlWitthoft, I guess the bioinformatician reflux kicked in. These are read counts from gene expression data. They are discrete counts and therefore are likely to be integers (>= 0). – Arun Aug 05 '13 at 13:41

2 Answers2

26
df[apply(df[,-1], 1, function(x) !all(x==0)),]
bartektartanus
  • 15,284
  • 6
  • 74
  • 102
  • can you please elaborate how I shall convert it am not being able to understand the command you wrote , sorry for my limited knowledge in R – ivivek_ngs Aug 05 '13 at 11:51
  • df is your data frame. The rest stays the same – bartektartanus Aug 05 '13 at 12:54
  • Should work but Arun's solution in his comment above is much cleaner. – Carl Witthoft Aug 05 '13 at 13:09
  • 2
    Yes. When you assume, that you don't have any negative values in data frame. – bartektartanus Aug 05 '13 at 13:36
  • 5
    @bartektartanus, these are discrete counts as they are gene expression values. They don't take < 0 and they don't take floating-point values. Even so, there's no need for `apply` here. You could check for `df[rowSums(df[, -1] > 0) != 0, ]` – Arun Aug 05 '13 at 13:42
  • Rather `df[!rowSums(df[, -1] == 0) == (ncol(df)-1), ]` – Arun Aug 05 '13 at 13:48
  • 1
    What is clean is a matter of preference and taste. Me for one, I consider the use of `any` and `all` to be closer to the question then the `rowSums` hack that implies non-obvious assumptions about the data. Read this solution literally as "not all x equal zero". – Bernhard Jul 19 '21 at 07:31
2

A lot of options to do this within the tidyverse have been posted here: How to remove rows where all columns are zero using dplyr pipe

my preferred option is using rowwise()

library(tidyverse)

df <- df %>% 
    rowwise() %>% 
    filter(sum(c(col1,col2,col3)) != 0)
PatrickW
  • 21
  • 2