0

I have a regular expression that works to extract a url from a given string. It's in C# and I want to convert it to javascript:

 private static Regex urlPattern = new Regex(@"(?i)\b((?:[a-z][\w-]+:(?:/{1,3}|[a-z0-9%])|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'"".,<>?«»“”‘’]))", RegexOptions.Compiled | RegexOptions.IgnoreCase);

But when I try this since there is no verbatim it gives me error:

var regexToken = /(?i)\b((?:[a-z][\w-]+:(?:/{1,3}|[a-z0-9%])|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'"".,<>?«»“”‘’]))/i;

How can I easily convert this? I get the following SyntaxError: invalid quantifier enter image description here

Brian
  • 5,069
  • 7
  • 37
  • 47
Justin Homes
  • 3,739
  • 9
  • 49
  • 78
  • "there is no verbatim and it gives me error" can you please specify what this means and **what your error is** – tnw Apr 11 '13 at 19:27
  • 1
    why are you using `regex` to extract `url`? – Anirudha Apr 11 '13 at 19:27
  • @The_Land_Of_Devils_SriLanka how else would you identify the url pattern in a given string. the string has multiple urls. – Justin Homes Apr 11 '13 at 19:34
  • @JustinHomes regex is used for regularly occurring patterns..parsing url from html should never be done using regex because it was never made for that purpose..`html` is **not** strict(except `xhtml`) and would **certainly** break your code...you should better use a parser like [htmlagilitypack](http://htmlagilitypack.codeplex.com/)..even if your string is not an html,still you would be able to extract all the url's – Anirudha Apr 11 '13 at 19:38
  • htmlagilitypack is a server side solution, i need client side solution – Justin Homes Apr 11 '13 at 19:43

1 Answers1

1

(?i) is no valid option to set the ignoreCase flag in JavaScript (while ignored in Opera, it seems to throw a SyntaxError for you). The flags are only given as a suffix of the regular expression literal, or as a string in the second parameter of the RegExp constructor.

Also, you forgot to escape the slashes - since the delimit the literal, they need to be escaped.

Use either

var regexToken = /\b((?:[a-z][\w-]+:(?:\/{1,3}|[a-z0-9%])|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}\/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'"".,<>?«»“”‘’]))/i;

or (slightly more complicated)

var regexToken = new RegExp("\\b((?:[a-z][\\w-]+:(?:/{1,3}|[a-z0-9%])|www\\d{0,3}[.]|[a-z0-9.\\-]+[.][a-z]{2,4}/)(?:[^\\s()<>]+|\\(([^\\s()<>]+|(\\([^\\s()<>]+\\)))*\\))+(?:\\(([^\\s()<>]+|(\\([^\\s()<>]+\\)))*\\)|[^\\s`!()\\[\\]{};:'\"\".,<>?«»“”‘’]))", "i");
Bergi
  • 630,263
  • 148
  • 957
  • 1,375