1

Really simple, but I can get the 'greediness' of regex to work like I want. Say you have:

unlist(stringr::str_extract_all("XXXXSXTXXX","([A-Z]{2}[T|S][A-Z]{2})"))

This gives only the first match:

[1] "XXSXT"

How can I change the regex behaviour to give me both matches with S and T (without use two separate patterns), like:

[1] "SXTXX" "XXSXT" 
user3375672
  • 3,728
  • 9
  • 41
  • 70

1 Answers1

1

You need to use lookahead for that with perl=True option for match in R.

(?=([A-Z]{2}[TS][A-Z]{2}))

See demo.

https://regex101.com/r/cJ6zQ3/23

vks
  • 67,027
  • 10
  • 91
  • 124
  • 1
    Ah perfect!, I tried everything (even lookahead - but the wrong way) – user3375672 Oct 07 '15 at 08:54
  • 1
    I think an explanation is due here. It is not just a look-ahead that does the actual job here, the look-ahead is a kind of a "transport/vehicle" for the real "worker" - capturing group. – Wiktor Stribiżew Oct 07 '15 at 09:00
  • @stribizhev yes `lookahead` will traverse all possible path which is being captured by `groups`. – vks Oct 07 '15 at 09:02