2

I apologize in advance if the title of this post isn't accurate. I already know that this is a super easy question and if I knew the correct terminology I probably could find a pervious post about this.

So what I am attempting to do is filter my data using dplyr for a gene family. Here is an example so it makes a bit more sense.

I have a gene family called ADCY but what comprises that family is 10 seperate genes. So the family looks like this

ADCY1
ADCY2
ADCY3
ADCY4
ADCY5
ADCY6
ADCY7
ADCY8
ADCY9
ADCY10 

I know I can do something like this but it is kind of annoying to have to type out all 10 genes, especially when I have a bunch of other gene families I want to look at.

genes <- c("ADCY1", "ADCY2", "ADCY3", "ADCY4", "ADCY5", "ADCY6", "ADCY7", 
           "ADCY8", "ADCY9", "ADCY10")`


df_filtered <- df %>%
                 filter(symbol %in% genes)

I was wondering if there was a was to use dpylr and filter for just maybe the start of the gene name? If that makes sense? I know there is a starts_with("ADCY") that I can use, but my R session crashes when I try and use that with the filter option. I was wondering if anyone had some solutions!

Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
neuron
  • 1,949
  • 1
  • 15
  • 30
  • 1
    `starts_with` is helpful when you want to select columns that their name match your pattern. Not when you want to select values of a specific variable/column. Try something from `grep` function family for your case. – AntoniosK Aug 30 '18 at 13:27
  • 2
    `df %>% filter(grepl("^ADCY", V1))` ? – Ronak Shah Aug 30 '18 at 13:28
  • 2
    or `df %>% filter(str_detect(symbol, "^ADCY"))` – Maurits Evers Aug 30 '18 at 13:29
  • @AntoniosK Thanks for the suggestion! I will keep that in mind in the future!! I really appreciate the help! Worked like a charm – neuron Aug 30 '18 at 13:36
  • @RonakShah Thanks for the help!! That worked like a charm!! Thank you so much for the help – neuron Aug 30 '18 at 13:37
  • @MauritsEvers That also worked! Had to install an extra package but that's okay! I really appreciate the help! – neuron Aug 30 '18 at 13:37

1 Answers1

1

You can use the good old (I mean no dependency required) paste0:

paste0("ADCY", 1:10)
[1] "ADCY1"  "ADCY2"  "ADCY3"  "ADCY4"  "ADCY5"  "ADCY6"  "ADCY7"  "ADCY8"  "ADCY9" 
[10] "ADCY10"
Vincent Bonhomme
  • 7,235
  • 2
  • 27
  • 38