1

I am using str_match from the stringr package to capture text in between brackets.

library(stringr)

strs = c("P5P (abcde) + P5P (fghij)", "Glcext (abcdef)")
str_match(strs, "\\(([a-z]+)\\)")

gives me only the matches "abcde" and "abcdef". How can I capture the "fghij" as well with still using the same regex for both strings?

smci
  • 32,567
  • 20
  • 113
  • 146
user1981275
  • 13,002
  • 8
  • 72
  • 101
  • 5
    Does `str_match_all(strs, "\\(([a-z]+)\\)")` do what you're wanting? – Josh O'Brien Jan 18 '13 at 17:45
  • Works perfectly, thanks! i did not know about it cause it was not linked in the "See Also:" section in the help page... – user1981275 Jan 18 '13 at 17:48
  • 3
    That *would* be a good addition to the help file. In my experience, it's often worth trying something like this to see what else is available: `ls("package:stringr")`. – Josh O'Brien Jan 18 '13 at 17:51

1 Answers1

5
str_extract_all(strs, "\\(([a-z]+)\\)")

or as @JoshO'Brien mentions in his comment,

str_match_all(strs, "\\(([a-z]+)\\)")

This can just as easily be accomplished with base R:

regmatches(strs, gregexpr("\\(([a-z]+)\\)", strs))
Matthew Plourde
  • 43,932
  • 7
  • 96
  • 113
  • Works well, thanks. But why is it that with the 'regmatches' the brackets are also captured? – user1981275 Jan 18 '13 at 17:52
  • `str_match_all` returns both matches and groups within your matches. `str_extract_all` and `regmatches`/`gregexpr` return just the matches. If you want these last to methods to return just what's instead the `()`, use `pat = "(?<=\\()[a-z]+(?=\\))"` with `str_extract_all(strs, perl(pat))` or `regmatches(strs, gregexpr(pat, strs, perl=TRUE))`. This pattern uses look ahead and behind assertions. See the end of the Perl section in `?regex` for more details on whats going on here. – Matthew Plourde Jan 18 '13 at 18:03