1

Consider the following example:

x <- "something('pineapple', 'orange', y = c('peach', 'banana'), z = 'lemon'), something(v = c('apple', 'pear'), z = c('cherry', 'strawberry', 'grape'))"

I want to extract the segments encapsulated by something( and its matching ). Thus, the respective function should extract "something('pineapple', 'orange', y = c('peach', 'banana'), z = 'lemon')" and "something(v = c('apple', 'pear'), z = c('cherry', 'strawberry', 'grape'))" in this example. There can be any number of parentheses within something(). Thus, I cannot simply extract everything from something( to the next ). That would e.g. return the first segment as "something('pineapple', 'orange', y = c('peach', 'banana')".

I am essentially looking to fill in the regex placeholder in

stringr::str_extract_all(x, "something\\(<text until matching parenthesis>")
Chr
  • 1,017
  • 1
  • 8
  • 29

2 Answers2

4

We can use strsplit with a regex looking for

  • , a comma
  • \\s* zero or more spaces
  • (?=something) a positive lookahead for "something"

strsplit(x, ',\\s*(?=something)', perl=TRUE)[[1]]
# [1] "something('pineapple', 'orange', y = c('peach', 'banana'), z = 'lemon')"  
# [2] "something(v = c('apple', 'pear'), z = c('cherry', 'strawberry', 'grape'))"

Data:

x <- "something('pineapple', 'orange', y = c('peach', 'banana'), z = 'lemon'), something(v = c('apple', 'pear'), z = c('cherry', 'strawberry', 'grape'))"
jay.sf
  • 60,139
  • 8
  • 53
  • 110
4

You can use recursion (?1 or ?R) to match balanced constructs or nested constructs.

regmatches(x, gregexpr("something(\\(([^()]|(?1))*\\))", x, perl=TRUE))
#[[1]]
#[1] "something('pineapple', 'orange', y = c('peach', 'banana'), z = 'lemon')"  
#[2] "something(v = c('apple', 'pear'), z = c('cherry', 'strawberry', 'grape'))"

Where a(?1)z is a recursion which match one or more letters a followed by exactly the same number of letters z.

GKi
  • 37,245
  • 2
  • 26
  • 48