Remove rows in a dataframe if 0 is found X number of times

Question

I have a large dataframe which looks like the toy example below.

          Sample1        Sample2      Sample3      Sample4
  Gene1     0               0            0            1 
  Gene2     1               0            0            1 
  Gene3     1               1            1            0

I want to remove all genes which were equal to 0 in at least two samples. Only Gene3 should remain.

The answer in this question was close but not specific enough for my question. How to remove rows with 0 values using R

df[apply(df[,-1], 1, function(x) !all(x==0)),]

Can it be adjusted to remove rows is x==0 two or more times?

`df[!rowSums(df == 0) > 2, , drop = FALSE]`. – Rui Barradas Jan 28 '21 at 12:07 — Rui Barradas, Jan 28 '21 at 12:07

Rui Barradas · Accepted Answer · 2021-01-28T16:09:49.707

2

Here is a one-liner. Note that rowSums is coded in C and is fast.

df[!rowSums(df == 0) >= 2, , drop = FALSE]

edited Jan 28 '21 at 16:09

answered Jan 28 '21 at 12:09

Rui Barradas

70,273
8
34
66

Re-reading the question it seems as if @Krutik asked to remove rows if the number of zeroes >= 2? – rps1227 Jan 28 '21 at 13:42
@rps1227 You are right, corrected. Thanks. – Rui Barradas Jan 28 '21 at 16:10

score 1 · Answer 2 · answered Jan 28 '21 at 12:11

1

Two solutions:

df[-which(apply(df, 1, function(x) sum(x == 0) > 2)),]

or:

subset(df,!rowSums(df == 0) > 2)

answered Jan 28 '21 at 12:11

Chris Ruehlemann

20,321
4
12
34

Remove rows in a dataframe if 0 is found X number of times

2 Answers2