4

As the title suggests, I'm trying to retrieve the domain from a string using javascript regular expression.

Take the following strings:

String                                  ==>     Return
"google"                                ==>     null
"google.com"                            ==>     "google.com"
"www.google.com"                        ==>     "www.google.com"
"ftp://ftp.google.com"                  ==>     "ftp.google.com"
"http://www.google.com"                 ==>     "www.google.com"
"http://www.google.com/"                ==>     "www.google.com"
"https://www.google.com/"               ==>     "www.google.com"
"https://www.google.com.sg/"            ==>     "www.google.com.sg"
"https://www.google.com.sg/search/"     ==>     "www.google.com.sg"
"*://www.google.com.sg/search/"         ==>     "www.google.com.sg"

I've already read "Regex to find domain name without www - Stack Overflow" and "Extract root domain name from string - Stack Overflow" but they were too complicated so I tried writing my own regular expression:

var re = new RegExp("[\\w]+[\\.\\w]+");
/[\w]+[\.\w]+/
re.exec(document.URL);

which works fine with "google.com", "www.google.com" and "www.google.com.sg" but returns http with "http://google.com/", "http://www.google.com/" etc.

As I am new to regular expressions, I can't seem to figure out what's wrong... any ideas?

Thanks in advance!

Unihedron
  • 10,902
  • 13
  • 62
  • 72
Cheejyg
  • 336
  • 4
  • 12

2 Answers2

11

Use this regex:

/(?:[\w-]+\.)+[\w-]+/

Here is a regex demo!

Sampling:

>>> var regex = /(?:[\w-]+\.)+[\w-]+/
>>> regex.exec("google.com")
... ["google.com"]
>>> regex.exec("www.google.com")
... ["www.google.com"]
>>> regex.exec("ftp://ftp.google.com")
... ["ftp.google.com"]
>>> regex.exec("http://www.google.com")
... ["www.google.com"]
>>> regex.exec("http://www.google.com/")
... ["www.google.com"]
>>> regex.exec("https://www.google.com/")
... ["www.google.com"]
>>> regex.exec("https://www.google.com.sg/")
... ["www.google.com.sg"]
Unihedron
  • 10,902
  • 13
  • 62
  • 72
  • Omgawd thanks! Love that regex, short n' sweet~ Although I'm still trying to figure out how it works lol... Also, `>>> regex.exec("ftp://www.google.com") ... ["ftp.google.com"]`, how'd you get that? haha :) – Cheejyg Aug 15 '14 at 08:42
  • 1
    just to add few bits, a domain name may also have a hyphen sign `-`, may you need to adjust the same. – pushpraj Aug 15 '14 at 08:46
  • @pushpraj and how would I add it to the regex? I'm not really that good at regex so yeah... lol – Cheejyg Aug 15 '14 at 08:51
  • @Unihedron I still can't figure it out how u did it, care to explain how it works? – Cheejyg Aug 15 '14 at 10:04
  • 2
    @Cheejyg What we would like to match is a `aaa.bbb(.ccc.ddd.eee...)` sequence. I did this by quoting the characters as `[\w-]+` (any Word Character or hyphens), having another group as the characters with a dot `(?:[\w-]+\.)`, and quantify it to allow matching more than one time. `+`. – Unihedron Aug 15 '14 at 10:14
  • @Unihedron Cool I understand now, thanks! Btw, is this, `/[\w-]+(?:\.[\w-]+)+/`, the same as your regex? – Cheejyg Aug 15 '14 at 10:50
  • Yes. Try out the demo and replace the regex in the grid with yours - it's the same. Messing around regexes are also good ways of learning them. :) – Unihedron Aug 15 '14 at 10:52
  • @stumped221 what do you mean "only hit on"? do you want `\.(?:com|net)` *somewhere in* the sequence or *at the end of* the sequence? – Unihedron Jan 31 '17 at 13:12
2

You can use this regex in Javascript:

\b(?:(?:https?|ftp):\/\/)?([^\/\n]+)\/?

RegEx Demo

anubhava
  • 761,203
  • 64
  • 569
  • 643
  • As a note for users, the regex __captures__ the target String. See [this](http://regex101.com/r/bQ4eJ8/2). – Unihedron Aug 15 '14 at 08:38
  • 1
    This regex doesn't really work because hardcoding `http`, `https`, `ftp`, etc. makes it very tedious and complicated to add new schemes. e.g. `"file://www.example.com/"` == `\b(?:(?:https?|ftp|file):\/\/)?([^\/\n]+)\/?` and so on.. – Cheejyg Aug 15 '14 at 09:22