6

I am using dplyr and I would like to filter my dataframe (biotypes) according to sample IDs which are the first column of the data frame, e.g. they look like this:

ID
chrX.tRNA494-SerAGA 
chrX.tRNA636-AlaCGC
mmu_piR_000007
...

I want to filter IDs starting with "chr" from IDs starting with "mmu":

biotype<- biotype %>% 
  filter( str_detect (biotype, "^chr") ==TRUE )
biotype

Can anyone help please? I am just looking for something like * that allows me to filter all rows that have a string starting with these particular characters ...

Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
Anna
  • 61
  • 1
  • 2
  • So your desired output would include `chrX.tRNA494-SerAGA` and `chrX.tRNA636-AlaCGC` but not include `mmu_piR_000007`? Is that correct? (It's always helpful to include example input and output in your question.) – Loren Dec 02 '17 at 15:29
  • Yes exactly. Sorry you are right I should have been more precise – Anna Dec 02 '17 at 16:31

2 Answers2

6

I think you were very close already.

library(stringr)
biotype %>% filter(str_detect(ID,"^chr"))

(you need to specify the column name, and == TRUE is superfluous).

Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
3

What about grepl?

biotype <- biotype %>%
    filter(grepl('^chr', ID))
smci
  • 32,567
  • 20
  • 113
  • 146
Birger
  • 1,111
  • 7
  • 17