2

Hi I am wondering if there is a regular expression that can do the following:

Select all the substrings from a string that :

  • start with & and
  • have n number of characters after the & (n >= 0)

AND those substrings are NOT

  • &
  • '
  • <
  • > or
  • "

For example, given the string

'Stewie & Brian    &partners in crime;'

is there a regex that will return only the substring &partners ?

My intuition says no , because I need a context free grammar but how can I prove that? Is there a regex to test it with the pumping lemma ?

Or a regex actually exists and my intuition is just wrong?

Thank you

Tim Pietzcker
  • 328,213
  • 58
  • 503
  • 561
  • Hi, I've taken the liberty to reformat your post. By placing short texts between backticks, you can format them as code (so you don't need to insert a space between `&` and `apos;` in order to keep the literal text). Longer texts and multi-line code samples can be formatted that way by indenting them by four spaces. Both can be achieved by selecting the text and pressing Ctrl-K. You might want to look at the editor help for more of these formatting tips. Aside from that, welcome to StackOverflow! – Tim Pietzcker Jan 08 '16 at 16:53

1 Answers1

2

Sure:

&(?!(amp|apos|lt|gt);)\S{4,}

for n=4

See live demo.

The key here is the negative look ahead (?!(amp|apos|lt|gt);), which asserts (without consuming input) that the input immediately following does not match (amp|apos|lt|gt);

Bohemian
  • 412,405
  • 93
  • 575
  • 722