How to filter data.frame which has more than n number of entries in a column

Question

How can i remove row from the dataset which have more than n number of genes.

data1 <- Re_leve logp   chr  start  end     CNA     Genes 
             1  1.5     1   739400  756200  gain    Trp1,Eggier 
             1  8.3     1   127730  128210  gain    Zranb3,R3hdm1,.....

score 2 · Accepted Answer · answered Feb 11 '15 at 12:56

2

You may try

library(stringr)
n <- 1
df1[!str_count(df1$Genes, ',')+1 >n,]

answered Feb 11 '15 at 12:56

akrun

874,273
37
540
662

@beginner Could you be a bit more specific? – akrun Feb 11 '15 at 13:08
1

@beginner Are you not asking the same question again? http://stackoverflow.com/questions/28109327/subset-only-those-rows-whose-intervals-does-not-fall-within-another-data-frame – zx8754 Feb 11 '15 at 13:10
@beginner Have you tried the solutions in the link. Also, this question and the description is completely different from your follow up question. – akrun Feb 11 '15 at 13:13
Here i want to remove the overlaps within the same dataset,not comparison – beginner Feb 11 '15 at 13:15
I can post it as a different question – beginner Feb 11 '15 at 13:15
@beginner That will be better and when you post, please include the link and mention why it didn't meet your expectations. – akrun Feb 11 '15 at 13:19

score 2 · Answer 2 · answered Feb 11 '15 at 13:05

Try this:

#dummy data
data1 <- data.frame(x=1:3,
                    Gene=c("asdf,asdf,ee,d","asdf","dfd,sdf"),
                    stringsAsFactors = FALSE)

#minimum number of genes
n <- 1

#subset
data1[sapply(data1$Gene,function(i)length(unlist(strsplit(i,",")))) > n, ]

#   x           Gene
# 1 1 asdf,asdf,ee,d
# 3 3        dfd,sdf

How to filter data.frame which has more than n number of entries in a column

2 Answers2