0

I am parsing domains and running into a problem handling subdomains. If the domain is http://www.google.co.uk, I want to obtain the length of google which is 6.

I am using parse_url() to return the host in this case www.google.co.uk like so.

$url    = 'http://www.google.co.uk';    
$info   = parse_url($url);    
// remove www. and return google.co.uk
$new    = str_replace('www.','',$info['host']); 
$pieces = explode(".", $new); 
$len    = strlen($pieces[0]); // returns character length of google = 6
echo $len;

My code doesn't work if the domain contains a subdomain like http://test.google.co.uk: it returns a length of 4; I expect it to return a length of 6.

Any ideas?

BryanH
  • 5,826
  • 3
  • 34
  • 47
Chill Web Designs
  • 1,311
  • 2
  • 16
  • 31
  • So, in the case of http://test.google.co.uk, what would you expect the length to be? – BryanH Dec 20 '12 at 17:31
  • I only want to return the length of the domain not any sub domain. If there is a sub domain like test.google.co.uk then the length would equal 4 as the $pieces[0] would take the first section being 'test' and not google. – Chill Web Designs Dec 20 '12 at 17:34
  • Are you saying that in the case of http://test.google.co.uk, you would expect the length to be 6? – BryanH Dec 20 '12 at 17:37
  • I would like it to return 6 yes, but because of the sub domain it gives the wrong strlen() that I am after. – Chill Web Designs Dec 20 '12 at 17:38
  • 1
    You won't be able to do this without a list containing all TLDs. It is impossible to determine if in `x.y.z` x or y is the domain. imagine `google.co.uk` vs `google.com`. – ThiefMaster Dec 20 '12 at 17:42
  • Yeah this is my issue, would you know of a way to do this. I guess all TDLs will have to be added to an array and checked against the domain. – Chill Web Designs Dec 20 '12 at 17:44
  • 1
    might be a time to reconsider strategy -- what is the overall goal/purpose leading to this implementation route? – Nathan Dec 20 '12 at 17:47
  • If the TDL exists in array remove it from the url to give test.google then explode . to check if value is empty or nothing – Chill Web Designs Dec 20 '12 at 17:47
  • the thing is -- TLDs are not constant over time, and the list is long. does one include multi-byte TLDs? how often to update list? – Nathan Dec 20 '12 at 17:48
  • if you see http://www.woorank.com/en/www/google.com and go to the usability section I am trying to achieve something like this. – Chill Web Designs Dec 20 '12 at 17:48
  • perhaps when evaluating the domain could compare with other entries in your database and then group those who appear to be related. trying to determine a single/isolated instance will be challenging due to the number and variancy of TLDs. – Nathan Dec 20 '12 at 17:52

2 Answers2

0

Output is correct. when input is http://test.google.co.uk value of parse_url('http://test.google.co.uk')['host'] is http://test.google.co.uk. When you will exploce this string on dot first element of array will be test and its length is 4.

To get google instead of test you need to replace subdomain with nothing as you did in your first example or take the second element in exploded string. E.g:

$url    = 'http://test.google.co.uk';    
$info   = parse_url($url);    
$pieces = explode(".", $info['host']); 
$len    = strlen($pieces[1]); // returns character length of google = 6
echo $len;
Leri
  • 12,367
  • 7
  • 43
  • 60
0

There is not other way than collect and hardcode all known public 2-nd level zones (like co.uk, com.ua, co.tw and so on) and filter them in your code. Be aware to detect test.example.ua as test becouse both example.com.ua and example.ua are valid domains (which is not a case with uk zone).

Your code may look like this:

function mainDomainLength($fullDomain) {
    //$fullDomain = 'DOMAIN.co.uk';
    $zones = array('uk' => array('co'), 'ua' => array('com', 'org'), ...);
    $domainArray = explode('.', $fullDomain);
    if (count($domain) > 2 && isset($zones[$domain[count($domain)-1]])) {
        if (isset($zones[$domain[count($domain)-1]][$domain[count($domain)-2]])) {
            return strlen($domain[count($domain)-3]);
        }
    } else if (count($domain) > 1) {
        return strlen($domain[1]);
    } else {
        return strlen($domain[0]);
    }
}

EDIT: By the way! Look at Get the second level domain of an URL (java). As I can understand there is the answer you need (and url to special domains collection collected be Mozilla).

Community
  • 1
  • 1
Valera Leontyev
  • 1,191
  • 6
  • 14