First thing to understand is, inside a character class, none of the meta-characters of regex has any special meaning. They are matched literally. For e.g., an *
will match a *
and will not mean 0 or 1
repetition. Similarly, ()
will match (
and )
, and will not create a capture group
.
Now, if a ]
is found in a character class, that automatically closes the character class, and the further character won't be the part of that character class. Now, let's understand what is happening above:
In 1
, 2
, and 4
, your character class ends at the first closing ]
. So, the last closing bracket - ]
, is not the part of character class. It has to be matched separately. So, your pattern will match something like this:
'[[ab]]' is same as '([|a|b)(])' // The last `]` has to match.
'[ab[]]' is same as '(a|b|[)(])' // Again, the last `]` has to match.
'[ab]]' is same as '(a|b|])(])' // Same, the last `]` has to match.
^
^---- Character class closes here.
Now, since in both the string, there is no ]
at the end, hence no match is found.
Whereas, in the 3rd pattern, your character class is closed only by the last ]
. And hence everything comes inside the character class.
'[ab[]' means match string that contains 'a', or 'b', or '['
which is perfectly valid and match both the string.
And what does [(ab)]
and [^(ab)]
mean?
[(ab)]
means match any of the (
, a
, b
, )
. Remember, inside a character class, no meta-character of regex has any special meaning. So, you can't create groups inside a character class.
[^(ab)]
means exact opposite of [(ab)]
. It matches any string which does not contain any of those characters specified.
Is it the same as [ab]
and [^ab]
?
No. These two does not include (
and )
. Hence they are little different.