0

I am trying to create some regex patterns that match a website domain.

The rules are as below :

For France, the URL pattern must have /fr-fr (followed by anything else) after the domain name, ie www.domain.com/fr-fr/anything
For Germany, the URL pattern must have /de-de (followed by anything else) after the domain name, ie www.domain.com/de-de/anything
And for all other countries, the URL pattern can be the root domain (ie www.domain.com) OR anything EXCEPT fr-fr and de-de after the domain name 

I have these Regex patterns for France and Germany which work fine :

https?://www.domain.com.*?/(?i)FR-FR.\*

and

https?://www.domain.com.*?/(?i)DE-DE.\*

However, I am struggling to get a Regex pattern that will match the root domain and other domains (such as www.domain.com/en-us with anything after it) but EXCLUDE /fr-fr.* and /de-de.*

I have tried a negative lookahead, such as this (for example, NOT france) :

https?://www.domain.com.*?/(?!fr-fr).\*

But this does not seem to work, and matches against URLs that it should not.

Maybe I am missing something obvious.

Any help very much appreciated.

Sangam Belose
  • 4,262
  • 8
  • 26
  • 48
  • What language/tool are you using, and do you even need to use a regex here to validate your URLs? – Tim Biegeleisen Nov 17 '17 at 11:34
  • You may use `(?i)https?://www\.domain\.com[^/]*/(?!DE-DE|FR-FR).*` – Wiktor Stribiżew Nov 17 '17 at 11:36
  • You don't need to do any negation. If it matches for Germany, it's Germany. If it matches for France, it's France. Done, no more regex tests necessary. You don't need to check that it *doesn't* match Germany or France after the first two tests have failed. – Tomalak Nov 17 '17 at 11:36
  • Remove the `.*?`. – melpomene Nov 17 '17 at 11:41
  • 1
    Hi...thanks for the replies. Tim, its Java and it's an e-commerce framework with builtin functionality to match a site against regex, so yes I need regex. Tomalak - thanks, but I can't override the existing mechanism that simply does a 1:1 match with the URL against it's regex. Melpomene - the .*? is needed in case any port numbers are specified before the first /. Wiktor - thanks, I will try that and see if it works! – Chris Marks Nov 17 '17 at 11:47
  • Wiktor, thanks for the reply - but the regex appears not to match against a base URL of https://www.example.com. Also, it appears to match https://www.example.com/fr-fr against the Regex when it shouldnt. – Chris Marks Nov 17 '17 at 12:01

1 Answers1

0

Only "Germany" URLs:

^(?i)https?://www.domain.com(:\d+)?/de-de(/.*)?$

Only "France" URLs:

^(?i)https?://www.domain.com(:\d+)?/fr-fr(/.*)?$

URLs that are neither "Germany" nor "France"

^(?i)https?://www.domain.com(:\d+)?(?!/fr-fr|/de-de)(/.*)?$
Tomalak
  • 332,285
  • 67
  • 532
  • 628
  • 1
    Awesome! Just put these into my JUnit test that tests all of my combinations and it seems perfect! Thanks ever so much Tomalak – Chris Marks Nov 17 '17 at 14:34