8

I'm trying to include two positive lookaheads in one regex. Here's the problem I'm working on as an example.

(?=[a-zA-Z])(?=[0-9])[a-zA-Z0-9]{0,20}

This is what I'm trying to match:

  • 0-20 characters
  • one or more letter anywhere
  • one or more number anywhere
  • only letters and numbers allowed

When I do this with only one lookahead, it works, but as soon as I add the other, it breaks. What's the correct syntax for two lookaheads?

Qaz
  • 1,556
  • 2
  • 20
  • 34
  • 5
    How about this `(?=.*[a-zA-Z])(?=.*[0-9])[a-zA-Z0-9]{0,20}`? – revo Sep 30 '14 at 20:19
  • Side note, that doesn't allow for underscores. You need `[a-zA-Z0-9_]{0,20}`, which is synonymous to `\w{0,20}`. – Sam Sep 30 '14 at 20:22
  • I realized I forgot to include underscores in my original regex, so I took it out of the list of requirements. I'll use \w in the final version. Thanks! – Qaz Sep 30 '14 at 20:25
  • @revo Fails where a pattern of `\d{20}[a-zA-Z]` matches (i.e. if the lookahead looks more than 20 ahead to report positively, it's false in terms of the specification. Hard to fix since variable-length lookarounds are a pain for regex engines. (`(?=.*{0,19}[a-zA-Z])(?=.*{0,19}\d)[a-zA-Z0-9]{0,20}` should work if it compiles) – AlexR Sep 30 '14 at 20:29
  • @AlexR Yes, add a `$` at the end of regex. – revo Sep 30 '14 at 20:34
  • @revo I elaborated on the effects of the different approaches to fix the issue :) – AlexR Sep 30 '14 at 20:42

2 Answers2

12

Lookaheads are like wanders! You limited the domain of looks at the first place which won't fulfill the requirement. You may use a greedy dot .* (or lazy .*?) regex to allow a lookahead to look for each requirement.

As @AlexR mentioned in comments I modify the RegEx a little bit:

^(?=.*[a-zA-Z])(?=.*[0-9])[a-zA-Z0-9_]{0,20}$

By the way, you forgot matching underscores, which I added.

The above is almost equal to:

^(?=[^a-zA-Z]*[a-zA-Z])(?=\D*\d)\w{1,20}$
revo
  • 47,783
  • 14
  • 74
  • 117
2

A problem with @revos answer occurs when the input is too long: 01234567890123456789A passes both lookaheads and the final check. A fixed version either checks for end-of-string with ^ and $ or uses variable-length lookaround (or both):

^(?=.{0,19}[a-zA-Z])(?=.{0,19}[0-9])[a-zA-Z0-9]{0,20}$ // (1), (1*) without ^
^(?=.*[a-zA-Z])(?=.*[0-9])[a-zA-Z0-9]{0,20}$
(?=.{0,19}[a-zA-Z])(?=.{0,19}[0-9])[a-zA-Z0-9]{0,20} // (2)

Only the latter will allow text around the specified string. Omitting the ^ in the former variants will allow the password to be prefixed, i.e.

Input            : "Password1 = ASDF0123"
Matches with (1) : none
Matches with (1*): "ASDF0123"
Matches with (2) : "Password1", "ASDF0123"
AlexR
  • 2,412
  • 16
  • 26
  • You bring up a really interesting case. Since I'll be using this in a form, I can add a maxlength="20" attribute to the input tag. I wouldn't have realized I needed that without this. – Qaz Sep 30 '14 at 20:44
  • @Qaz I would still recommend using a stricter regex, since a tool like FireBug allows editing client-side so you can never be sure your form POST (or GET) is sane since it is under control of your (possibly malicious) client. – AlexR Sep 30 '14 at 20:53
  • No matter how exact I make the regex, it's dead easy to edit it on the client side- I'm putting it right in the HTML! () Don't worry, though. This is just for the benefit of the users so they know if their passwords fit the rules. I'll do a serious check on the client side. Also don't worry about how insecure the password rules are; this is not for production. – Qaz Sep 30 '14 at 21:03
  • @Qaz Okay you didn't mention the regex was client-side anyways. I thought you planned on using it server-side for validation (wich then should be one of my methods, depending on your needs). – AlexR Sep 30 '14 at 21:04
  • Yep, I'll definitely use a regex client-side, and maybe some other checks as well. I should have remembered JS wasn't just for the client side and mentioned specifically that this is going on both ends. – Qaz Sep 30 '14 at 21:12