7

I need to parse the domain name from a string. The string can vary and I need the exact domain.

Examples of Strings:

http://somename.de/
www.somename.de/
somename.de/
somename.de/somesubdirectory
www.somename.de/?pe=12

I need it in the following format with just the domain name, the tld, and the www, if applicable:

www.somename.de

How do I do that using C#?

George Stocker
  • 57,289
  • 29
  • 176
  • 237
Umair A.
  • 6,690
  • 20
  • 83
  • 130

4 Answers4

13

As an alternative to a regex solution, you can let the System.Uri class parse the string for you. You just have to make sure the string contains a scheme.

string uriString = "http://www.google.com/search";

if (!uriString.Contains(Uri.SchemeDelimiter))
{
    uriString = string.Concat(Uri.UriSchemeHttp, Uri.SchemeDelimiter, uriString);
}

string domain = new Uri(uriString).Host;

This solution also filters out any port numbers and converts IPv6 addresses to its canonical form.

Niels van der Rest
  • 31,664
  • 16
  • 80
  • 86
  • 1
    Your answers looks valid also. – Umair A. Jul 25 '10 at 19:14
  • @AbdulSaboor, what would you expect? The URL contains a host name with a space in it (" blabla") which makes it an invalid host name. Just the "http://" is also an invalid URL. The `Uri` constructor expects a valid URL. – Niels van der Rest Aug 28 '14 at 07:54
  • 1. It says the valid url if I remove the space. 2. i tried with only blabla still it says it is valid url. I think it should not. – Abdul Saboor Aug 28 '14 at 07:57
11

i simple used

 Uri uri = new Uri("http://www.google.com/search?q=439489");
            string url = uri.Host.ToString();
            return url;

because by using this you can sure.

abatishchev
  • 98,240
  • 88
  • 296
  • 433
  • Can't be so sure though, your solution also accepts "h t t p : / / h t t p : / /yee" as a correct url (without the spaces, but stackoverflow changes the double http:// into one... – Saskia Sep 01 '18 at 23:41
2

I checked out Regular Expression Library, and it looks like something like this might work for you:

^(([\w][\w\-\.]*)\.)?([\w][\w\-]+)(\.([\w][\w\.]*))?$
Brandon Satrom
  • 1,801
  • 14
  • 17
  • 2
    @Umair Ashraf - you should probably explain how it doesn't work. Can you give an example of a line it doesn't match? – Kobi Jul 24 '10 at 14:38
  • I straight put this line in Regex connstructor like (@"^(([\w][\w\-\.]*)\.)?([\w][\w\-]+)(\.([\w][\w\.]*))?$") – Umair A. Jul 24 '10 at 22:59
1

Try this:

^(?:\w+://)?([^/?]*)

this is a weak regex - it doesn't validate the string, but assumes it's already a url, and gets the first word, until the first slash, while ignoring the protocol. To get the domain look at the first captured group, for example:

string url = "http://www.google.com/hello";
Match match = Regex.Match(url, @"^(?:\w+://)?([^/?]*)");
string domain = match.Groups[1].Value;

As a bonus, it also captures until the first ?, so the url google.com?hello=world will work as expected.

Kobi
  • 135,331
  • 41
  • 252
  • 292