3

I want to write a regular expression that will find instances of three or more consecutive letters of the alphabet. I'm planning to use this regex with both JavaScript and grep. This seems like something that regex should be able to do pretty easily, but I'm having a hard time coming up with a regex that isn't very complicated:

(abc(d(e(f(g(h(i(j(k(l(m(n(o(p(q(r(s(t(u(v(w(x(yz?)?)?)?)?)?)?)?)?)?)?)?)?)?)?)?)?)?)?)?)?)?)?)|(bcd(e(f(g(h(i(j(k(l(m(n(o(p(q(r(s(t(u(v(w(x(yz?)?)?)?)?)?)?)?)?)?)?)?)?)?)?)?)?)?)?)?)?)?)|(cde(f(g(h(i(j(k(l(m(n(o(p(q(r(s(t(u(v(w(x(yz?)?)?)?)?)?)?)?)?)?)?)?)?)?)?)?)?)?)?)?)?)|...

To be clear, I want to match these test cases:

  • abc - match abc
  • def - match def
  • lmnop - match lmnop
  • xxxxghixxxx - match ghi
  • ab - no match (not long enough)
  • zyx - no match (not in order)
  • q r s - no match (interceding chararacters)
  • tuwx - no match (missing v)

Is there a way to write this regex that doesn't use 20+ levels of nested parenthesis and 20+ |s?

Stephen Ostermiller
  • 23,933
  • 14
  • 88
  • 109
  • You still need to spell out these possibilities. Besides, if the code works and you only think of enhancing it, please consider posting at [codereview.se]. – Wiktor Stribiżew Jun 24 '22 at 11:17
  • @WiktorStribiżew My code doesn't work, because it is too much to actually write out. I only got three `|`s in before giving up because it is getting too complicated. – Stephen Ostermiller Jun 24 '22 at 11:29

1 Answers1

3

This can be shortened to:

(?:(?=ab|bc|cd|de|ef|fg|gh|hi|ij|jk|kl|lm|mn|no|op|pq|qr|rs|st|tu|uv|vw|wx|xy|yz).){2,}.

See an online demo


  • (?: - Open a non-capture group;
    • (?=ab|bc|cd...). - A nested positive lookahead to assert character after lookahead has it's correct successor;
  • ){2,} - Close non-capture group and match 2+ times;
  • . - Match the final character to conclude the substring.
JvdV
  • 70,606
  • 8
  • 39
  • 70
  • 1
    I like how easy it is to vary the required match length. If you want to match `n` consecutive characters, you use `n-1` in the curly braces. For example to match 8 consecutive letters you use `{7,}` – Stephen Ostermiller Jun 24 '22 at 12:24