1

I'm currently working on a tool to integrates link of different social networks:

Facebook: https://www.facebook.com/jonathan.parentlevesque

Google plus: https://plus.google.com/+JonathanParentL%C3%A9vesque

Instagram: https://instagram.com/mariloubiz/

Pinterest: https://www.pinterest.com/jonathan_parl/

RSS: https://regex101.com

Twitter: https://twitter.com/arcadefire

Vimeo: https://vimeo.com/ondemand/crashtest/135301838

Youtube: https://www.youtube.com/user/Darkjo666

I'm using very basic regex like this one:

/^https?:\/\/(?:[a-z]{2}|[w]{3})?\.pinterest.com\/[\S]{5,}$/i

on client and server side for minimal domain validation on each links.

Then, I'm using this function to validate that the page really exists (it's useless to integrate social network links that don't work after all):

public static function isUrlExists($url){

    $exists = false;

    if(!StringManager::stringStartWith($url, "http") and !StringManager::stringStartWith($url, "ftp")){

        $url = "https://" . $url;
    }

    if (preg_match(RegularExpression::URL, $url)){

        $headers = get_headers($url);

        if ($headers !== false and !empty($headers)){

            if (strpos($headers[0], '404') === false){

                $exists = true;
            }   
        }
    }

    return $exists;
}

Note: In this function I'm using Diego Perini's regex for validating the URL before sending the request:

const URL = "%^(?:(?:https?|ftp)://)(?:\S+(?::\S*)?@|\d{1,3}(?:\.\d{1,3}){3}|(?:(?:[a-z\d\x{00a1}-\x{ffff}]+-?)*[a-z\d\x{00a1}-\x{ffff}]+)(?:\.(?:[a-z\d\x{00a1}-\x{ffff}]+-?)*[a-z\d\x{00a1}-\x{ffff}]+)*(?:\.[a-z\x{00a1}-\x{ffff}]{2,6}))(?::\d+)?(?:[^\s]*)?$%iu"; //@copyright Diego Perini

All the tested links so far didn't generate any error, but testing Pinterest produce me this quite scary series of error messages:

get_headers(): SSL operation failed with code 1. OpenSSL Error messages: error:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify failed

Array
(
    [url] => https://www.pinterest.com/jonathan_parl/
    [exists] => 
)

get_headers(): Failed to enable crypto

Array
(
    [url] => https://www.pinterest.com/jonathan_parl/
    [exists] => 
)

get_headers(https://www.pinterest.com/jonathan_parl/): failed to open stream: operation failed

Array
(
    [url] => https://www.pinterest.com/jonathan_parl/
    [exists] => 
)

Is anyone has an idea what I'm doing wrong here?

I mean, ain't Pinterest a popular social network with a valid certificate (I don't use it personally, I just created an account for testing)?

Thank you for your help,

Jonathan Parent-Lévesque from Montreal

  • 1
    It appears that your server doesn't realize that pinterst's SSL certificate came from a CA. Solution is, usually, to update your certificates or to use "insecure" certificates. Now this boils down to some sys administration and / or modifying the code that you use to fetch data from URI (curl, guzzle?). If you google the `OpenSSL Error messages: error:14090086` - you will get plenty of results and fixes. – N.B. Aug 19 '15 at 15:21
  • Creating a self-signed certificate didn't work for me, but changing for cURL worked (and it's more performant than get_headers(). Well, just hitted two birds with one stone heh. If can checkout my solution below (unit test included) if you're interested. Thank you – Jonathan Parent Lévesque Aug 19 '15 at 21:00
  • TL;DR... But you probably need to set stream context and relax SSL certificate validation. Please have a look at [stream_context_set_default()](http://php.net/manual/en/function.stream-context-set-default.php) if you have the chance. – Álvaro González Aug 19 '15 at 21:03
  • I'm glad you got it sorted using the tips. – N.B. Aug 19 '15 at 22:29

1 Answers1

2

I tried to create a self-signed certificate for my development environment (Xampp) as suggested by N.B. in his comment. That solution didn't worked for me.

His other solution was to use cUrl or guzzle instead get_headers(). Not only it worked, but, according to this developper's tests:

http://php.net/manual/fr/function.get-headers.php#104723

it is also way faster than get_headers().

For those interested, here's the code of my new function for those interested:

/**
* Send an HTTP request to a the $url and check the header posted back.
*
* @param $url String url to which we must send the request.
* @param $failCodeList Int array list of codes for which the page is considered invalid.
*
* @return Boolean
*/
public static function isUrlExists($url, array $failCodeList = array(404)){

    $exists = false;

    if(!StringManager::stringStartWith($url, "http") and !StringManager::stringStartWith($url, "ftp")){

        $url = "https://" . $url;
    }

    if (preg_match(RegularExpression::URL, $url)){

        $handle = curl_init($url);


        curl_setopt($handle, CURLOPT_RETURNTRANSFER, true);

        curl_setopt($handle, CURLOPT_SSL_VERIFYPEER, false);

        curl_setopt($handle, CURLOPT_HEADER, true);

        curl_setopt($handle, CURLOPT_NOBODY, true);

        curl_setopt($handle, CURLOPT_USERAGENT, true);


        $headers = curl_exec($handle);

        curl_close($handle);


        if (empty($failCodeList) or !is_array($failCodeList)){

            $failCodeList = array(404); 
        }

        if (!empty($headers)){

            $exists = true;

            $headers = explode(PHP_EOL, $headers);

            foreach($failCodeList as $code){

                if (is_numeric($code) and strpos($headers[0], strval($code)) !== false){

                    $exists = false;

                    break;  
                }
            }
        }
    }

    return $exists;
}

Let me explains the curl options:

CURLOPT_RETURNTRANSFER: return a string instead of displaying the calling page on the screen.

CURLOPT_SSL_VERIFYPEER: cUrl won't checkout the certificate

CURLOPT_HEADER: include the header in the string

CURLOPT_NOBODY: don't include the body in the string

CURLOPT_USERAGENT: some site needs that to function properly (by example : https://plus.google.com)


Additional note: I explode the header string and user headers[0] to be sure to only validate only the return code and message (example: 200, 404, 405, etc.)

Additional note 2: Sometime validating only the code 404 is not enough (see the unit test), so there's an optional $failCodeList parameter to supply all the code list to reject.

And, of course, here's the unit test to legitimates my coding:

public function testIsUrlExists(){

//invalid
$this->assertFalse(ToolManager::isUrlExists("woot"));

$this->assertFalse(ToolManager::isUrlExists("https://www.facebook.com/jonathan.parentlevesque4545646456"));

$this->assertFalse(ToolManager::isUrlExists("https://plus.google.com/+JonathanParentL%C3%A9vesque890800"));

$this->assertFalse(ToolManager::isUrlExists("https://instagram.com/mariloubiz1232132/", array(404, 405)));

$this->assertFalse(ToolManager::isUrlExists("https://www.pinterest.com/jonathan_parl1231/"));

$this->assertFalse(ToolManager::isUrlExists("https://regex101.com/546465465456"));

$this->assertFalse(ToolManager::isUrlExists("https://twitter.com/arcadefire4566546"));

$this->assertFalse(ToolManager::isUrlExists("https://vimeo.com/**($%?%$", array(400, 405)));

$this->assertFalse(ToolManager::isUrlExists("https://www.youtube.com/user/Darkjo666456456456"));


//valid
$this->assertTrue(ToolManager::isUrlExists("www.google.ca"));

$this->assertTrue(ToolManager::isUrlExists("https://www.facebook.com/jonathan.parentlevesque"));

$this->assertTrue(ToolManager::isUrlExists("https://plus.google.com/+JonathanParentL%C3%A9vesque"));

$this->assertTrue(ToolManager::isUrlExists("https://instagram.com/mariloubiz/"));

$this->assertTrue(ToolManager::isUrlExists("https://www.facebook.com/jonathan.parentlevesque"));

$this->assertTrue(ToolManager::isUrlExists("https://www.pinterest.com/"));

$this->assertTrue(ToolManager::isUrlExists("https://regex101.com"));

$this->assertTrue(ToolManager::isUrlExists("https://twitter.com/arcadefire"));

$this->assertTrue(ToolManager::isUrlExists("https://vimeo.com/"));

$this->assertTrue(ToolManager::isUrlExists("https://www.youtube.com/user/Darkjo666"));
}

I hope this solution will help someone,

Jonathan Parent-Lévesque from Montreal