I have a js code:
/^([a-zA-Z0-9]+[_|\_|\.]?)*[a-zA-Z0-9]+@([a-zA-Z0-9]+[_|\_|\.]?)*[a-zA-Z0-9]+\.[a-zA-Z]{2,3}$/
But what's meaning of [_|\_|\.]
?(js regexp)
I have a js code:
/^([a-zA-Z0-9]+[_|\_|\.]?)*[a-zA-Z0-9]+@([a-zA-Z0-9]+[_|\_|\.]?)*[a-zA-Z0-9]+\.[a-zA-Z]{2,3}$/
But what's meaning of [_|\_|\.]
?(js regexp)
If we use a resource like Regexper, we can visualise this regular expression:
From this we can conclude that [_|\_|\.]
requires one of either "_", "|" or ".". We can also see that the double declaration of "_" and "|" is unnecessary. As HamZa commented, this segment can be shortened to [_|.]
to achieve the same result.
In fact, we can even use resources like Regexper to visualise the entire expression.
It matches a pipe character, an underscore, or a period.
It is unnecessarily convoluted, however. It could be simpler.
It could be shortened to this
[|_.]
REGEX101 is a very good tool for understanding regular expression
Char class [_|\_|\.] 0 to 1 times [greedy] matches:
[_|\_|\. One of the following characters _|_|.
[_|\_|\.] requires one of either "_", "|" or "."
See This Link of RegEx101 here Your Expression explanation
[_|\_|\.]
is probably meant to match an underscore (_
) or a period (.
), and should have been written as [_.]
.
I'm reasonably sure the author is using the pipe (|
) to mean "or" (i.e., alternation), which isn't necessary inside a character class. As the other responders said, the pipe actually matches a literal pipe, but I don't believe that was the author's intent. It's a very common beginner's mistake.
The dot (.
) is another special character that loses its special meaning when it appears in a character class. There's no need to escape it with a backslash as the author did, though it does no harm. And the underscore never has any special meaning; I won't even try to guess why the author listed it twice, once with a backslash and once without.
You didn't ask about it, but the ?
doesn't belong there either. That's what makes the regex so horribly inefficient, as Kobi remarked. The idea was to match one or more alphanumerics, then optionally match a separator character (dot or underscore), which must be followed by some more alphanumerics, repeating as needed. Here's how I would write that:
[a-zA-Z0-9]+([_.][a-zA-Z0-9]+)*
If it runs out of alphanumerics and the next character is not _
or .
, it skips that whole section and tries to match the next part. And if it can't do that, it can bail out immediately because no match is possible. But the way your regex is written, the separator is optional independently of the things it's supposed to separate, which makes it useless. The regex engine has to keep backing up, trying to match characters that it has already consumed in endless, pointless combinations before it can give up. And that, unfortunately, is another common mistake.