-2

How to extract the pay level domain from a URL, is there any java library which automatically does this ?

Noor
  • 19,638
  • 38
  • 136
  • 254

1 Answers1

1

Last time I checked I didn't find any lib and I ended up using this regex:

private static final Pattern URL_PATTERN = Pattern.compile(
        "(?:^|[\\W])((ht|f)tp(s?):\\/\\/|www\\.)"
                + "(([\\w\\-]+\\.){1,}?([\\w\\-.~]+\\/?)*"
                + "[\\p{Alnum}.,%_=?&#\\-+()\\[\\]\\*$~@!:/{};']*)",
        Pattern.CASE_INSENSITIVE | Pattern.MULTILINE | Pattern.DOTALL);

Guava's InternetDomainName might be used to compose it out of the individual elements though.

Example usage:

For example, for the domain name mail.google.com, this method returns the list ["mail", "google", "com"]

ImmutableList<String> parts = InternetDomainName.from("mail.google.com").parts()
ldz
  • 2,217
  • 16
  • 21
  • and then how do you use it ? – Noor Apr 15 '17 at 20:37
  • Added an example. – ldz Apr 15 '17 at 20:48
  • I believe the example you mention only split the host part and returns the parts, this is something faily easy, but based on the parts, and related to your example, how do you know that "google" is the Pay Level Domain, that's y i was asking if there is any library – Noor Apr 16 '17 at 12:41
  • because i believe to do this, one must have a record of the host on the right and continually check if it's a public or private host – Noor Apr 16 '17 at 12:42