How to extract a link from head tag of a remote page using curl

Question

I have some url and all the html of these urls have the following tag in their tag

 <link rel="image_src" href="http://imgv2-4.scribdassets.com/img/word_document/15490455
  /164x212/8a4ab0c34b/1337732662" />

I am using the following code

    $url = 'my url';
    $ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);    // The url to get links from
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); // We want to get the respone
$result = curl_exec($ch);

$regex='|<a.*?href="(.*?)"|';
preg_match_all($regex,$result,$parts);
$links=$parts[1];
foreach($links as $link){
    //if(strpos($link,'format=json') !==false) {
        echo $link;
    //}
}

Now I want to grab this link href but how, I don't know. Please help me

Thanks

When I tried to extract the content of the url, then 400 BAD Request message is shown on my page — way2project, Jul 02 '12 at 20:13
Could you be more specific? Is it one html file? How are you extracting it at the moment? Could you post your php code? — Johan, Jul 02 '12 at 20:14
$ch = curl_init(); curl_setopt($ch, CURLOPT_URL, $url); // The url to get links from curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); // We want to get the respone $result = curl_exec($ch); $regex='| — way2project, Jul 02 '12 at 20:16
Please post the code in the question itself. You can edit your question. — , Jul 02 '12 at 20:19

score 2 · Answer 1 · answered Jul 02 '12 at 21:32

2

I prefer using PHP's DOMDocument going through HTML, versus preg_match. Something like this should work:

$xpath = new DOMXPath($result);
$links = $xpath->query('//link[@rel="image_src"]');
foreach ($links as $link) {
     $src = $link->nodeValue;
}

answered Jul 02 '12 at 21:32

Mark Roach

1,039
9
20

score 2 · Answer 2 · answered Nov 19 '13 at 06:00

Here's another alternative that helped me. It's similar to the DOMXPATH suggestion by @Mark Roach

$dom = new DOMDocument;
$dom->loadHTML($html);
$nodes = $dom->getElementsByTagName('link');
foreach ($nodes as $node){
    if ($node->getAttribute('rel') === 'image_src')
    {
        echo($node->getAttribute('href'));
    }
}

Danny · Answer 3 · 2012-07-02T22:44:59.333

0

Like so

    <?php
    $url = 'http://www.scribd.com/doc/15490455/Learning-PHP-5';
    $ch = curl_init();
    curl_setopt($ch, CURLOPT_URL, $url);    // The url to get links from
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); // We want to get the respone
    $result = curl_exec($ch);

    $regex='#.*link rel=\"image_src\" href=\"(.*)\"./>#';
    preg_match($regex,$result,$parts);

    foreach ($parts as $part) {
       echo = $part;
    }
    ?>

edited Jul 02 '12 at 22:44

answered Jul 02 '12 at 21:15

Danny

66
3

Also, you might want to check if 'curl' is installed on your host. Mark your solution doesn't seem to work. Does xpath require the html to be valid to work, or am I missing something – Danny Jul 02 '12 at 23:36
The scribd.com site in your example is XHTML, and the tree is in the "http://www.w3.org/1999/xhtml" namespace. Mark's solution doesn't work for that site, because the XPath query is assuming the default namespace. – Joe Liversedge Feb 05 '13 at 16:59

How to extract a link from head tag of a remote page using curl

3 Answers3