0

I'd like to extract the name John Doe from the following string:

str <- 'Name: |             |John Doe     |'

I can do:

library(stringr)
str_extract(str,'(?<=Name: \\|             \\|).*(?=     \\|)')
[1] "John Doe"

But that involves typing a lot of spaces, and it doesn't work well when the number of spaces is not fixed. But when I try to use a quantifier (+), I get an error:

str_extract(str,'(?<=Name: \\| +\\|).*(?= +\\|)')
Error in stri_extract_first_regex(string, pattern, opts_regex = opts(pattern)) : 
  Look-Behind pattern matches must have a bounded maximum length. (U_REGEX_LOOK_BEHIND_LIMIT, context=`(?<=Name: \| +\|).*(?= +\|)`)

The same goes for other variants:

str_extract(str,'(?<=Name: \\|\\s+\\|).*(?=\\s+\\|)') 
str_extract(str,'(?<=Name: \\|\\s{1,}\\|).*(?=\\s{1,}\\|)')

Is there a solution to this?

msn
  • 113
  • 3

3 Answers3

1

How about: First we remove Name Then we replace all special characters with space and finally str_squish it

Library(stringr)

str_squish(str_replace_all( str_remove(str, "Name"), "[^[:alnum:]]", " "))
[1] "John Doe"
TarJae
  • 72,363
  • 6
  • 19
  • 66
1

Another solution using base R:

sub("Name: \\|\\s+\\|(.*\\S)\\s+\\|", "\\1", str)
# [1] "John Doe"
1

You might also use the \K to keep what is matched so far out of the regex match.

Name: \|\h+\|\K.*?(?=\h+\|)

Explanation

  • Name: \| match Name: |
  • \h+\| Match 1+ spaces and |
  • \K Forget what is matched so far
  • .*? Match as least as possible chars
  • (?=\h+\|) Positive lookahead, assert 1+ more spaces to the right followed by |

See a regex demo and a R demo.

Example

str <- 'Name: |             |John Doe     |'    
regmatches(str, regexpr("Name: \\|\\h+\\|\\K.*?(?=\\h+\\|)", str, perl=T))

Output

[1] "John Doe"
The fourth bird
  • 154,723
  • 16
  • 55
  • 70