-1

Where could I find a code (javascript would be the best) to strip out the www and second-level domain names from URLs?

Example:

www.ynet.co.il -> ynet (stripped 'co.il' - two tokens)
www.nike.com -> nike (stripped 'com' - one token)

etc

As a second best - the full list of second-level domains (preferably in CSV or any other format) will be welcomed as well.

BreakPhreak
  • 10,940
  • 26
  • 72
  • 108

2 Answers2

1

If you use Java, Guava can help you here.

You can use InternetDomainName.topPrivateDomain() together with publicSuffix() to solve your problem.

Guava (as well as Mozilla/Firefox, Chrome and Opera) use the Public Suffix List for this functionality (the raw data is here).

tld.js is a JavaScript library that uses that data as well.

Joachim Sauer
  • 302,674
  • 57
  • 556
  • 614
0

https://gist.github.com/2428561 something like this? Search for 'javascript url parser' in google

Pieter Willaert
  • 879
  • 13
  • 36