-2

I know what you are thinking "there are many similar questions", but no, you are wrong.

It is true that there are many questions that seek a PHP code that is able to obtain the last URL, I have based on those questions and answers to make my code, but it does not work for all cases.

I need a function that always returns the last url (even if there are 1000 redirects) and even if the redirection was done with JavaScript, PHP, Apache or other technology.

What I'm trying to say is that my code does not return the last one in all cases, only in some cases. I have read a lot about this and I have not found a solution, I am with this problem for a month and need your help.

My code is the following:

function get_real_url($comparator, $url){
$out = "";
    $final_url = get_final_url($url);
    if(strpos($final_url, 'url=') !== FALSE){ //Si devuelve un string
        parse_str($final_url, $out);
        if(!empty($out["url"]))
            return $out["url"];
        else
            return false;
    }else
        return $final_url; //Si devuelve una url completa
}

function get_final_url($url, $timeout = 5)
{
    //$url = str_replace( "&", "&", urldecode(trim($url)) );
    $cookie = tempnam ("/tmp", "CURLCOOKIE");
    $ch = curl_init();
    curl_setopt( $ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 5.1; rv:1.7.3) Gecko/20041001 Firefox/0.10.1" );
    curl_setopt( $ch, CURLOPT_URL, $url );
    curl_setopt( $ch, CURLOPT_COOKIEJAR, $cookie );
    curl_setopt( $ch, CURLOPT_FOLLOWLOCATION, true );
    curl_setopt( $ch, CURLOPT_ENCODING, "" );
    curl_setopt( $ch, CURLOPT_RETURNTRANSFER, true );
    curl_setopt( $ch, CURLOPT_AUTOREFERER, true );
    curl_setopt( $ch, CURLOPT_CONNECTTIMEOUT, $timeout );
    curl_setopt( $ch, CURLOPT_TIMEOUT, $timeout );
    curl_setopt( $ch, CURLOPT_MAXREDIRS, 10 );
    $content = curl_exec( $ch );
    $response = curl_getinfo( $ch );
    curl_close ( $ch );
    if ($response['http_code'] == 301 || $response['http_code'] == 302)
    {
        ini_set("user_agent", "Mozilla/5.0 (Windows; U; Windows NT 5.1; rv:1.7.3) Gecko/20041001 Firefox/0.10.1");
        $headers = @get_headers($response['url']);
        if(!$headers) return $url;
        $location = "";
        foreach($headers as $value)
        {
            if (substr(strtolower($value), 0, 9) == "location:")
                return get_final_url( trim( substr( $value, 9, strlen($value) ) ) );
        }
    }
    if(preg_match("/window\.location\.replace\('(.*)'\)/i", $content, $value) || preg_match("/window\.location\=\"(.*)\"/i", $content, $value))
    {
        return get_final_url($value[1]);
    }
    else
    {
        return $response['url'];
    }
}

What technology should I use so that my code executes (if necessary) javascript redirection code and get the last url?

Carlos
  • 201
  • 2
  • 10
  • Possible duplicate of [PHP - Detect the incoming url requesting php page from another source/url](https://stackoverflow.com/questions/9790771/php-detect-the-incoming-url-requesting-php-page-from-another-source-url) – weegee Jun 17 '19 at 08:46
  • The HTTP_REFERER header can also be removed for privacy reasons by anyone in the browser. This will not always work – weegee Jun 17 '19 at 08:48
  • It is not a duplicate, please read my question until the end. – Carlos Jun 17 '19 at 08:50
  • 1
    Not sure how clear your question is here. What do you mean by the 'last url'? I think you mean that given a url, follow redirects (if applicable) until there are no more, and that's the url you are trying to discover. – Progrock Jun 17 '19 at 09:27
  • Exactly. But no other question takes into account to follow redirections JS (if it were the case). What I'm trying to say is that I need to add the necessary code to follow the redirects made from JS. – Carlos Jun 17 '19 at 10:00

1 Answers1

1

If you also want to check for browser-based redirects, you should not use PHP. This will get really tricky if you have to parse and evaluate not only JS code that is directly embedded in the markup, but also module-based code that is only loaded after running some JS.

Why not use something like Selenium or a headless browser for this?

Nico Haase
  • 11,420
  • 35
  • 43
  • 69
  • Could you give me an example of using the technologies that you mentioned to me? Because I never read about "headless browser" and I can not imagine how to integrate it with PHP. (Because I need to assign the final URL to a PHP function that is responsible for doing other things). – Carlos Jun 17 '19 at 08:48
  • 1
    Well, that might be impossible due to the reasons I gave in the answer. Parsing, evaluating, or even running JS code that is not meant to be run without any interaction is not that easy – Nico Haase Jun 17 '19 at 09:15