0

I am having trouble subsetting a large data frame. I have 5,000 observations and 60+ columns. I want to subset based on ~ 30 columns -- essentially to "drop" any observations where the sum of the values in these 30 columns of interest == 0. A small sample is below: I would want to get rid of UID #1 and #3.

UID 236.1(b) 261.5(c) 261.5(d)
 1       0        0       0 
 2       2        3       0
 3       0        0       0
 4       0        0       0

I have tried the following code:

sub <- subset(df, rowSums(df[, 29:60]>0))

which generated the following error term:

Error in subset.data.frame(merge_charge, rowSums(merge_charge[, 29:60] > : 'subset' must be logical

and:

 test <- subset(rowSums(df[,29:60]>0))

Which generated the following error:

Error in subset.default(rowSums(merge_charge[, 29:60] > 0)) : argument "subset" is missing, with no default

Any suggestions or pointers would be most appreciated.

lmo
  • 37,904
  • 9
  • 56
  • 69
Julia
  • 9
  • 2
  • correction: sub <- subset(df, rowSums(df[, 29:60])>0). In above code position of parenthesis is wrong in your code. – 9Heads Sep 24 '16 at 05:22

1 Answers1

3

First, take a look at subset() function. You can use it like this:

subset(data, condition)

So, you miss the data argument here.

Second, you put ( in rowSums wrongly. It must be rowSums(df[,1:2]) > 0 Therefore, It'll be:

test <- subset(your_data, rowSums(your_data[,29:60])>0 )
Chau Pham
  • 4,705
  • 1
  • 35
  • 30