The ?
in this context is a special operator on the repetition operators (+
, *
, and ?
). In engines where it is available this causes the repetition to be lazy or non-greedy or reluctant or other such terms. Typically repetition is greedy which means that it should match as much as possible. So you have three types of repetition in most modern perl-compatible engines:
.* # Match any character zero or more times
.*? # Match any character zero or more times until the next match (reluctant)
.*+ # Match any character zero or more times and don't stop matching! (possessive)
More information can be found here: http://www.regular-expressions.info/repeat.html#lazy for reluctant/lazy and here: http://www.regular-expressions.info/possessive.html for possessive (which I'll skip discussing in this answer).
Suppose we have the string aaaa
. We can match all of the a's with /(a+)a/
. Literally this is
match one or more a
's followed by an a
.
This will match aaaa
. The regex is greedy and will match as many a
's as possible. The first submatch is aaa
.
If we use the regex /(a+?)a
this is
reluctantly match one or more a
s followed by an a
or
match one or more a
s until we reach another a
That is, only match what we need. So in this case the match is aa
and the first submatch is a
. We only need to match one a
to satisfy the repetition and then it is followed by an a
.
This comes up a lot when using regex to match within html tags, quotes and the suchlike -- usually reserved for quick and dirty operations. That is to say using regex to extract from very large and complex html strings or quoted strings with escape sequence can cause a lot of problems but it's perfectly fine for specific use cases. So in your case we have:
/Dev/videos/1610110089242029/
The expression needs to match videos/
followed by zero or more characters followed by /"
. If there is only one videos URL there that's just fine without being reluctant.
However we have
/videos/1610110089242029/" ... ajaxify="/Dev/videos/1610110089242029/"
Without reluctance, the regex will match:
1610110089242029/" ... ajaxify="/Dev/videos/1610110089242029
It tries to match as much as possible and /
and "
satisfy .
just fine. With reluctance, the matching stops at the first /"
(actually it backtracks but you can read about that separately). Thus you only get the part of the url you need.