1

I need to compare 3 different URLs, in order to that i need to make them identical like example.com I've created a preg_match(). So far i've accomplished to make the urls to example.com when the url looks like : http://www.example.com, http://www.example.com/foo and www.example.com. The only issue what im getting is, when the url looks like: http://example.com it doesnt preg_match it.. I think it recognizes it as 'clean' so it skips it. Can you guys tell me what im doing wrong?

My code looks like this :

$pattern = '/.*[\.\/]([a-zA-Z0-9\-]+\.\w{2,3})\/.*/';
            $results = $reader->noHeading()->takeColumns(1)->toArray();

            $cleaned = array();
            for($i = 0; $i < count($results); $i++){
                if(preg_match($pattern,$results[$i][0],$cleaned[$i]) === 1){
                    echo "<pre>";
                    var_dump($cleaned[$i][1]);
                    echo "</pre>";
                } 
            }

Thanks for your time!

2 Answers2

2

If you're looking for a non regex solution:

$myurl = "http://example.com";
$raw_url = parse_url($myurl); 
$domain_only = str_replace ('www.','', $raw_url['host']); 
echo $domain_only; 

http://php.net/manual/en/function.parse-url.php

parse_url returns an array of URL components in this case you're looking for the host and can just replace www if it exists.

Ryan Tuosto
  • 1,941
  • 15
  • 23
1

It will work if you change your pattern to:

$pattern = '/.*[\.\/]([a-zA-Z0-9\-]+\.\w{2,3}).*/'
Kulikov Sergey
  • 265
  • 1
  • 9
  • This did the trick, thank you! But what exactly did you change? – Armando van Oeffelen May 02 '17 at 17:52
  • I've just removed checking for "/" after host name, because your url `http://example.com` doesn't have slash symbol after the host name – Kulikov Sergey May 02 '17 at 18:16
  • hey @Kulikov Sergey this pattern doesnt fix urls like: `array(2) { [0]=> string(80) "http://www.architectuurtourmaastricht.nl/architectuurtourmaastricht.nl/Home.html" [1]=> string(8) "Home.htm" }` Can you please help me with this? Thanks – Armando van Oeffelen May 02 '17 at 21:22
  • @Armando van Oeffelen, Try to use another pattern `$pattern = '/^([^:]+:\/\/)?(\w+\.)?([a-zA-Z0-9\-]+\.\w{2,3})/';` and change `var_dump($cleaned[$i][1]);` to `var_dump($cleaned[$i][3]);` – Kulikov Sergey May 03 '17 at 07:34