1

This is more a question to satisfy my curiosity than a real need for help, but I will appreciate your help equally as it is driving me nuts.

I am trying to negate an exact string using Javascript regular expressions, the idea is to exclude URL that include the string "www". For instance this list:

http://www.example.org/
http://status.example.org/index.php?datacenter=1
https://status.example.org/index.php?datacenter=2
https://www.example.org/Insights
http://www.example.org/Careers/Job_Opportunities
http://www.example.org/Insights/Press-Releases

For that I can succesfully use the following regex:

/^http(|s):..[^w]/g

This works correctly, but while I can do a positive match I cannot do something like:

/[^www]/g  or  /[^http]/g

To exclude lines that include the exact string www or http. I have tried the infamous "negative Lookeahead" like that:

/*(?: (?!www).*)/g 

But this doesn't work either OR I cannot test it online, it doesn't works in Notepad++ either.

If I were using Perl, Grep, Awk or Textwrangler I would have simply done:

!www   OR  !http

And this would have done the job.

So, my question is obviously: What would be the correct way to do such thing in Javascript? Does this depend on the regex parser (as I seem to understand?).

Thanks for any answer ;)

runlevel0
  • 2,715
  • 2
  • 24
  • 31

2 Answers2

4

You need to add a negative lookahead at the start.

^(?!.*\bwww\.)https?:\/\/.*

DEMO

(?!.*\bwww\.) Negative lookahead asserts that the string we are going to match won't contain, www.. \b means word boundary which matches between a word character and a non-word character. Without \b, www. in your regex would match www. in foowww.

Avinash Raj
  • 172,303
  • 28
  • 230
  • 274
  • Whoaaa, thanks a lot! Now I need to grasp why it works. I mean, I can "read" it, I see that there is a negative lookahead and a word boundary... and now I see that the reference in W3C schools is very limited, there is more here: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp Cool, THX, for a bash guy like me it takes some time to get used to the logic of this type of regex. – runlevel0 Jan 13 '15 at 16:07
  • Hi Avinash: I did understand it reading it slowly and step by step. What I mean is that I need to learn to "speak" and "think" it in the same way I use to think and speak Perl or Bash regexes. It's like learning a new language ;) Thanks for your interest ! – runlevel0 Jan 14 '15 at 10:08
0

To negate 'www' at every position in the input string:

var a = [
    'http://www.example.org/',
    'http://status.example.org/index.php?datacenter=1',
    'https://status.example.org/index.php?datacenter=2',
    'https://www.example.org/Insights',
    'http://www.example.org/Careers/Job_Opportunities',
    'http://www.example.org/Insights/Press-Releases'
];
a.filter(function(x){ return /^((?!www).)*$/.test(x); });

So at every position check that 'www' doesn't match, and then match any character (.).

1983
  • 5,882
  • 2
  • 27
  • 39