4

I have a js code:

/^([a-zA-Z0-9]+[_|\_|\.]?)*[a-zA-Z0-9]+@([a-zA-Z0-9]+[_|\_|\.]?)*[a-zA-Z0-9]+\.[a-zA-Z]{2,3}$/

But what's meaning of [_|\_|\.]?(js regexp)

Shadow The GPT Wizard
  • 66,030
  • 26
  • 140
  • 208
JackSun
  • 1,418
  • 3
  • 15
  • 19
  • 8
    It's nonsense, it's a character class which means `match _ or | or .` zero or one time. It could be shortened to `[|_.]?` but I doubt it is the intention of it's writer. – HamZa Oct 03 '13 at 08:31
  • 3
    The regex is [written terribly](http://stackoverflow.com/questions/9687596/slow-regex-performance), and will perform poorly. For example, try it on `aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa@a.a`: http://www.regex101.com/r/bM1fK8 . Besides that, it doesn't support all valid domains (eg full Unicode), TLDs (eg .museum), or email names (eg `email+tag@example.com`). You can find a better pattern. – Kobi Oct 03 '13 at 08:41

4 Answers4

11

If we use a resource like Regexper, we can visualise this regular expression:

Example

From this we can conclude that [_|\_|\.] requires one of either "_", "|" or ".". We can also see that the double declaration of "_" and "|" is unnecessary. As HamZa commented, this segment can be shortened to [_|.] to achieve the same result.

In fact, we can even use resources like Regexper to visualise the entire expression.

Community
  • 1
  • 1
James Donnelly
  • 126,410
  • 34
  • 208
  • 218
5

It matches a pipe character, an underscore, or a period.
It is unnecessarily convoluted, however. It could be simpler.

It could be shortened to this
[|_.]

Joe Simmons
  • 1,828
  • 2
  • 12
  • 9
5

REGEX101 is a very good tool for understanding regular expression

Char class [_|\_|\.] 0 to 1 times [greedy] matches:

[_|\_|\. One of the following characters _|_|.
 [_|\_|\.] requires one of either "_", "|" or "."

See This Link of RegEx101 here Your Expression explanation

3

[_|\_|\.] is probably meant to match an underscore (_) or a period (.), and should have been written as [_.].

I'm reasonably sure the author is using the pipe (|) to mean "or" (i.e., alternation), which isn't necessary inside a character class. As the other responders said, the pipe actually matches a literal pipe, but I don't believe that was the author's intent. It's a very common beginner's mistake.

The dot (.) is another special character that loses its special meaning when it appears in a character class. There's no need to escape it with a backslash as the author did, though it does no harm. And the underscore never has any special meaning; I won't even try to guess why the author listed it twice, once with a backslash and once without.

You didn't ask about it, but the ? doesn't belong there either. That's what makes the regex so horribly inefficient, as Kobi remarked. The idea was to match one or more alphanumerics, then optionally match a separator character (dot or underscore), which must be followed by some more alphanumerics, repeating as needed. Here's how I would write that:

[a-zA-Z0-9]+([_.][a-zA-Z0-9]+)*

If it runs out of alphanumerics and the next character is not _ or ., it skips that whole section and tries to match the next part. And if it can't do that, it can bail out immediately because no match is possible. But the way your regex is written, the separator is optional independently of the things it's supposed to separate, which makes it useless. The regex engine has to keep backing up, trying to match characters that it has already consumed in endless, pointless combinations before it can give up. And that, unfortunately, is another common mistake.

Community
  • 1
  • 1
Alan Moore
  • 73,866
  • 12
  • 100
  • 156