5

I am looking to match multiple string criteria and then subset the row in R, using grepl to find the match. I have found a nice solution from another post where some specific code is used (but you get the idea): subset(GEMA_EO5, grepl(paste(l, collapse="|"),GEMA_EO5$RefSeq_ID))

I am wondering if it is possible to grepl in two columns, instead of just RefSeq_ID in the example above. That is, in grepl via any other method. In other words, I would like to look for the options in l not just in one column, but in two (or however many). Is this possible?

eg.: 3 columns, a b and c. I would like to criteria such that T (rows 3 and 4) is selected, despite the format "T I" in (3,b). it should identify both (4,a) and (3,b), hence the link to the previous question. I want it to look in column a AND column b, not one or the other.

    a    b     c

    A    A C   P L
    V    V B   W E E
    W    T I   P J G
    T    W P   J
Community
  • 1
  • 1
kirk
  • 307
  • 3
  • 6
  • 14
  • It sounds like you could just use `|` to combine the results of multiple calls to `grepl`. Or melt your data frame and make one sweep through. Do you have a more concrete example? – Peyton Jun 03 '13 at 13:25
  • You may also be able to just paste the columns together. – Peyton Jun 03 '13 at 13:29
  • @Peyton I have edited the post to include an example – kirk Jun 03 '13 at 13:42
  • So, just to be clear, the question has nothing to do with string matching? Might be worth changing the title. And the tags. – alexwhan Jun 03 '13 at 13:46
  • Yes, if you're just working with numbers, you don't need `grep`. – Peyton Jun 03 '13 at 13:49
  • If you search for logical indexing you'll find lots of examples. This is absolutely not 'advanced string matching' – alexwhan Jun 03 '13 at 13:51
  • Sorry of course you are all right, I have modified the example. I was trying to simplify but obviously lost the purpose in the process – kirk Jun 03 '13 at 13:54
  • In your example, alexwhan's solution would work then. You could also just paste the two columns together and use a single call to `grepl` (_in your example_--things change if you need to match the beginning and end of the string, for instance). – Peyton Jun 03 '13 at 13:57

1 Answers1

6

Here's some demo data to show how this works:

set.seed(1234)
dat <- data.frame(A = sample(letters[1:3],10,TRUE),
                  B = sample(letters[1:3],10,TRUE))

Using [ to subset makes this a lot more clear in my opinion - we can use grepl to give a logical vector based on a match, and use | to combine two tests (on multiple columns). If you wanted a subset of all the rows that contained an 'a' in either column:

dat.a <- dat[with(dat, grepl("a", A)|grepl("a", B)),]
  A B
1 b a
2 b a
3 a c
5 a a
9 a a
alexwhan
  • 15,636
  • 5
  • 52
  • 66
  • 1
    Thanks, this works, if I use `dat[with(dat,grepl(paste(l,collapse="|"),df$c | grepl(paste(l,collapse="|"),df$b)),]` to account for the string spacing – kirk Jun 03 '13 at 14:03
  • where l is the list of string criteria to match as in the linked post in the question – kirk Jun 03 '13 at 14:04