1

My goal is the same as that explained here: How to remove rows with 0 values using R

I have created df1 by reading a .csv file containg about 50000 rows, many with 0 0 0 0 in the FPKM columns. Input data frame:

tracking_id   gene_id locus   FPKM-R1 FPKM-R2 FPKM-R3 FPKM-R4
NR_1  DDX1    chr1:1  0.031   0.056   0.0045  0.0345
NR_2  DDX2    chr1:2  3.06588 3.65986 5.32698 3.6996
NR_3  DDX3    chr1:4  0   0   0   0
NR_4  DDX4    chr1:4  0   0   0   0
NM_5  DDX5    chr1:5  0   0   0.046985    0

Desired output data frame:

tracking_id   gene_id locus   FPKM-R1 FPKM-R2 FPKM-R3 FPKM-R4
NR_1  DDX1    chr1:1  0.031   0.056   0.0045  0.0345
NR_2  DDX2    chr1:2  3.06588 3.65986 5.32698 3.6996
NM_5  DDX5    chr1:5  0   0   0.046985    0

I am trying to apply the rowSums solution as proposed in the past post:

df2 <- df1[rowSums(df1[, -3]) > 0, ]

but am getting

Error in rowSums(df1[, -3]) : 'x' must be numeric

I have checked that the data in the four FPKM-R columns is numeric. I'm at a loss as to why this error is occurring and what I can do to fix it. Thanks!

Community
  • 1
  • 1
jap
  • 13
  • 2
  • 4

1 Answers1

2

It looks like you want examine all columns but the first three.

df1[, -3] is the data frame with the third column removed. You want to remove columns 1, 2 and 3, which is represented by 1:3 in R, giving this expression:

df2 <- df1[rowSums(df1[, -(1:3)]) > 0, ]
Matthew Lundberg
  • 42,009
  • 6
  • 90
  • 112