I have a problem with the following regular expression:
var s = "http://www.google.com/dir/file\r\nhello"
var re = new RegExp("http://([^/]+).*/([^/\r\n]+)$");
var arr = re.exec(s);
alert(arr[2]);
Above, I expect arr[2] (i.e. capture group 2) to be "file", matching against the last 4 character in the first line after applying a greedy .*, backtracking due to / in the pattern, and then anchoring against the end of line by $.
In fact, arr[] is null, which implies that the pattern did not even match.
I can alter this slightly so it does precisely what I intend:
var s = "http://www.google.com/dir/file\r\nhello"
var re = new RegExp("http://([^/]+).*/([^/\r\n]+)[\r\n]*");
var arr = re.exec(s);
alert(arr[2]); // "file", as expected
My question is not so how much HOW to grab "file" from the end of the first line in s. Instead, I'm trying to understand WHY the first regexp fails and the second succeeds. Why does $ not match against the \r\n line break in example 1? Isn't that the sole purpose of its existence? Is there something else I'm missing?
Also, consider the same first regular expression as used in sed (with extended regular expression mode enabled with -r):
$ echo -e "http://www.google.com/dir/file\r\nhello" |sed -r -e 's#http://([^/]+).*/([^/\r\n]+)$#\2.OUTSIDE.OF.CAPTURE.GROUP#'
<<OUTPUT>>
file.OUTSIDE.OF.CAPTURE.GROUP
hello
Here, capture group 2 captures "file" and nothing else. "hello" appears in the output, but does not exist inside the capture group, which is proven by the position of string ".OUTSIDE.OF.CAPTURE.GROUP" in the output. So the regular expression works according to my understanding in sed, but not using the built in Javascript regexp engine.
If I replace \r\n in the input string with just \n, the behavior is identical for all three above examples, so that should not be relevant as far as I can tell.