1

I'm using a API to return a set a URLs, all URLs have redirects but how many redirects and where the URLs go are unknown.

So what I'm trying to do is to trace the path and find the last URL.

I basically want do the same as: http://wheregoes.com/retracer.php, but I only need to know the last URL

I've found a way to do it with CURL but the trace stops when it is a Meta-Refresh.

I've seen this thread: PHP: Can CURL follow meta redirects but it doesn't help me a lot.

This is my current code:

function trace_url($url){
    $ch = curl_init($url);
    curl_setopt_array($ch, array(
        CURLOPT_FOLLOWLOCATION => TRUE,
        CURLOPT_RETURNTRANSFER => TRUE,
        CURLOPT_SSL_VERIFYHOST => FALSE,
        CURLOPT_SSL_VERIFYPEER => FALSE,
    ));

    curl_exec($ch);
    $url = curl_getinfo($ch, CURLINFO_EFFECTIVE_URL);
    curl_close($ch);

    return $url;
}

    $lasturl = trace_url('http://myurl.org');

    echo $lasturl;
Community
  • 1
  • 1
StaalCtrl
  • 66
  • 1
  • 7
  • 1
    You need to write a script which follows the meta redirect. The URL you provided is pointing you in the right direction. For every meta refresh you need to make a new curl request. – Scriptman Mar 17 '17 at 14:30
  • Yes, I've figured as much. I need some help with the script for handling the meta refreshes. – StaalCtrl Mar 17 '17 at 15:02
  • how does the question [PHP: Can CURL follow meta redirects](http://stackoverflow.com/questions/1820705/php-can-curl-follow-meta-redirects) doesn't help ? how did you used it ? – hassan Mar 17 '17 at 15:30

1 Answers1

1

well, there are a big difference between Header Redirects , which is basically under 3xx class and META refresh , simply one way relies on the server, and the other related to the client .

and as long as curl or as known cURL or libcurl which is executed in the server , it can handle the first type, 'Header redirects' or http redirects.

so , you can then extract the url using bunch of ways.

you will need to handle it manually .

1) scrap the web page contents.

2) extract the link from the meta tag.

3) grab this new link if you want.


from your example:

function trace_url($url){
    $ch = curl_init($url);
    curl_setopt_array($ch, array(
        CURLOPT_FOLLOWLOCATION => TRUE,
        CURLOPT_RETURNTRANSFER => TRUE,
        CURLOPT_SSL_VERIFYHOST => FALSE,
        CURLOPT_SSL_VERIFYPEER => FALSE,
    ));

    curl_exec($ch);
    $url = curl_getinfo($ch, CURLINFO_EFFECTIVE_URL);
    curl_close($ch);

    return $url;
}

$response = trace_url('http://myurl.org');

// quick pattern for explanation purposes only, you may improve it as you like
preg_match('#\<meta.*?content="[0-9]*\;url=([^"]+)"\s*\/\>#', $response, $links);

$newLink = $links[1];

or as mentioned in your question about the solution provided which is use simplexml_load_file library .

$xml = simplexml_load_file($response);
$link = $xml->xpath("//meta[@http-equiv='refresh']");
hassan
  • 7,812
  • 2
  • 25
  • 36