0

I have a large dataframe which looks like the toy example below.

          Sample1        Sample2      Sample3      Sample4
  Gene1     0               0            0            1 
  Gene2     1               0            0            1 
  Gene3     1               1            1            0

I want to remove all genes which were equal to 0 in at least two samples. Only Gene3 should remain.

The answer in this question was close but not specific enough for my question. How to remove rows with 0 values using R

df[apply(df[,-1], 1, function(x) !all(x==0)),]

Can it be adjusted to remove rows is x==0 two or more times?

Chris Ruehlemann
  • 20,321
  • 4
  • 12
  • 34
Krutik
  • 461
  • 4
  • 13

2 Answers2

2

Here is a one-liner. Note that rowSums is coded in C and is fast.

df[!rowSums(df == 0) >= 2, , drop = FALSE]
Rui Barradas
  • 70,273
  • 8
  • 34
  • 66
1

Two solutions:

df[-which(apply(df, 1, function(x) sum(x == 0) > 2)),]

or:

subset(df,!rowSums(df == 0) > 2)
Chris Ruehlemann
  • 20,321
  • 4
  • 12
  • 34