Using rowSum and subset to clean data

Question

I am having trouble subsetting a large data frame. I have 5,000 observations and 60+ columns. I want to subset based on ~ 30 columns -- essentially to "drop" any observations where the sum of the values in these 30 columns of interest == 0. A small sample is below: I would want to get rid of UID #1 and #3.

UID 236.1(b) 261.5(c) 261.5(d)
 1       0        0       0 
 2       2        3       0
 3       0        0       0
 4       0        0       0

I have tried the following code:

sub <- subset(df, rowSums(df[, 29:60]>0))

which generated the following error term:

Error in subset.data.frame(merge_charge, rowSums(merge_charge[, 29:60] > : 'subset' must be logical

and:

 test <- subset(rowSums(df[,29:60]>0))

Which generated the following error:

Error in subset.default(rowSums(merge_charge[, 29:60] > 0)) : argument "subset" is missing, with no default

Any suggestions or pointers would be most appreciated.

correction: sub <- subset(df, rowSums(df[, 29:60])>0). In above code position of parenthesis is wrong in your code. — 9Heads, Sep 24 '16 at 05:22

score 3 · Answer 1 · answered Sep 24 '16 at 05:31

First, take a look at subset() function. You can use it like this:

subset(data, condition)

So, you miss the data argument here.

Second, you put ( in rowSums wrongly. It must be rowSums(df[,1:2]) > 0 Therefore, It'll be:

test <- subset(your_data, rowSums(your_data[,29:60])>0 )

Using rowSum and subset to clean data

1 Answers1