0

I've got a Regular Expression meant to validate that a phone number string is either empty, or contains 10-14 digits in any format. It works for requiring a minimum of 10 but continues to match beyond 14 digits. I've rarely used lookaheads before and am not seeing the problem. Here it is with the intended interpretation in comments:

///  ^                      - Beginning of string
/// (?=                     - Look ahead from current position
///      (?:\D*\d){10,14}       - Match 0 or more non-digits followed by a digit, 10-14 times
///      \D*$                   - Ending with 0 or more non-digits
/// .*                      - Allow any string
/// $                       - End of string
^(?=(?:\D*\d){10,14}\D*|\s*$).*$

This is being used in an asp.net MVC 5 site with the System.ComponentModel.DataAnnotations.RegularExpressionAttribute so it is in use server side with .NET Regexes and client-side in javascript with jquery validate. How can I get it to stop matching if the string contains more than 14 digits?

xr280xr
  • 12,621
  • 7
  • 81
  • 125
  • 2
    You don't need a lookahead. Try `^(?:(?:\D*\d){10,14}\D*)?$`, which will match an empty string or a string that contains `10-14` digits. [Demo](https://regex101.com/r/HakJrt/1/). – Cary Swoveland Apr 27 '20 at 05:16
  • Would you please try: `(?:^|(?<=\D))\d{10,14}(?:(?=\D)|$)`. – tshiono Apr 27 '20 at 05:22
  • tshiono, you can try it yourself by pasting that regex at the link I gave. – Cary Swoveland Apr 27 '20 at 05:24
  • 1
    ...but it doesn't work. You were on the right track in the question with `(?:\D*\d){10,14}` for counting the digits. I (basically) just added anchors. Suppose empty strings were not to match. Then the regex I gave would simplify to `^(?:\D*\d){10,14}\D*$`. – Cary Swoveland Apr 27 '20 at 05:31
  • @CarySwoveland oh...good point. Thank you. I think when I wrote this a couple years ago I had just finished writing a password regex with multiple assertions so I had lookarounds on the mind. What does the `14` essentially do in my version then? Just make sure there are 10-14 but not limit to 14? – xr280xr Apr 27 '20 at 17:36
  • xr280xr, I was confused in my last comment, at the time thinking it was tshiono's question. I just provided an answer. – Cary Swoveland Apr 27 '20 at 18:45

1 Answers1

1

The problem with the regular expression

^(?=(?:\D*\d){10,14}\D*|\s*$).*$

is that there is no end-of-line anchor between \D and |. Consider, for example, the string

12345678901234567890

which contains 20 digits. The lookahead will be satisfied because (?:\D*\d){10,14} will match

12345678901234

and then \D* will match zero non-digits. By contrast, the regex

^(?=(?:\D*\d){10,14}\D*$|\s*$).*$

will fail (as it should).

There is, however, no need for a lookahead. One can simplify the earlier expression to

^(?:(?:\D*\d){10,14}\D*)?$

Demo

Making the outer non-capture group optional allows the regex to match empty strings, as required.

There may be a problem with this last regex, as illustrate at the link. Consider the string

\nabc12\nab12c3456d789efg

The first match of (?:\D*\d) will be \nabc1 (as \D matches newlines) and the second match will be 2, the third, \nab1, and so on, for a total of 11 matches, satisfying the requirement that there be 10-14 digits. This undoubtedly is not intended. The solution is change the regex to

^(?:(?:[^\d\n]*\d){10,14}[^\d\n]*)?$

[^\d\n] matches any character other than a digit and a newline.

Demo

Cary Swoveland
  • 106,649
  • 6
  • 63
  • 100
  • 1
    Great, thank you! I am using it on an HTML input so it is only a single line, but also on the server-side I am stripping it of all the non-digits anyway so I think either version will work for my purposes. – xr280xr Apr 27 '20 at 22:52