0

I'm trying to extract character before and after "/" with no success. Sentences are:

XXXX YYY ZZZ - AV HAHEHRS, 3061 - SDDW ASDA DDSF - SAO JOSE DOS CAMPOS / SP - CEP: 00000-000

Output should be

SAO JOSE DOS CAMPOS / SP

I'm trying str_extract(str, "- [a-zA-Z]{1,} / [a-zA-Z]{1,}") but it's just bringing me

CAMPOS / SP
Tomalak
  • 332,285
  • 67
  • 532
  • 628
  • FWIW, your regex begins with the `-` character. It's unlikely that it returns the match you show. – Tomalak Jan 04 '18 at 05:31

1 Answers1

1

In your regex there is the space missing. Try:

str_extract(str, "- [a-zA-Z ]+ / [a-zA-Z ]+") 

Note the space in the character class. Also, {1,} is the long form of +.

The match will be "- SAO JOSE DOS CAMPOS / SP - CEP". You must get rid of the - in a second step, or use a zero-width look-behind:

str_extract(str, "(?<=- )[a-zA-Z ]+ / [a-zA-Z ]+") 

Look-behinds are supported by gregexpr.


For the sake of completeness, you could do this without regex: Split the input by '-', find the part that contains '/', trim. This might be faster than regex, too.

Tomalak
  • 332,285
  • 67
  • 532
  • 628