0

How can i remove row from the dataset which have more than n number of genes.

data1 <- Re_leve logp   chr  start  end     CNA     Genes 
             1  1.5     1   739400  756200  gain    Trp1,Eggier 
             1  8.3     1   127730  128210  gain    Zranb3,R3hdm1,.....
beginner
  • 411
  • 1
  • 5
  • 13

2 Answers2

2

You may try

library(stringr)
n <- 1
df1[!str_count(df1$Genes, ',')+1 >n,]
akrun
  • 874,273
  • 37
  • 540
  • 662
  • @beginner Could you be a bit more specific? – akrun Feb 11 '15 at 13:08
  • 1
    @beginner Are you not asking the same question again? http://stackoverflow.com/questions/28109327/subset-only-those-rows-whose-intervals-does-not-fall-within-another-data-frame – zx8754 Feb 11 '15 at 13:10
  • @beginner Have you tried the solutions in the link. Also, this question and the description is completely different from your follow up question. – akrun Feb 11 '15 at 13:13
  • Here i want to remove the overlaps within the same dataset,not comparison – beginner Feb 11 '15 at 13:15
  • I can post it as a different question – beginner Feb 11 '15 at 13:15
  • @beginner That will be better and when you post, please include the link and mention why it didn't meet your expectations. – akrun Feb 11 '15 at 13:19
2

Try this:

#dummy data
data1 <- data.frame(x=1:3,
                    Gene=c("asdf,asdf,ee,d","asdf","dfd,sdf"),
                    stringsAsFactors = FALSE)

#minimum number of genes
n <- 1

#subset
data1[sapply(data1$Gene,function(i)length(unlist(strsplit(i,",")))) > n, ]

#   x           Gene
# 1 1 asdf,asdf,ee,d
# 3 3        dfd,sdf
zx8754
  • 52,746
  • 12
  • 114
  • 209