4

I have some url and all the html of these urls have the following tag in their tag

 <link rel="image_src" href="http://imgv2-4.scribdassets.com/img/word_document/15490455
  /164x212/8a4ab0c34b/1337732662" />

I am using the following code

    $url = 'my url';
    $ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);    // The url to get links from
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); // We want to get the respone
$result = curl_exec($ch);

$regex='|<a.*?href="(.*?)"|';
preg_match_all($regex,$result,$parts);
$links=$parts[1];
foreach($links as $link){
    //if(strpos($link,'format=json') !==false) {
        echo $link;
    //}
}

Now I want to grab this link href but how, I don't know. Please help me

Thanks

way2project
  • 99
  • 3
  • 8
  • When I tried to extract the content of the url, then 400 BAD Request message is shown on my page – way2project Jul 02 '12 at 20:13
  • re: *When I tried...*. Show your code. – Cheeso Jul 02 '12 at 20:14
  • Could you be more specific? Is it one html file? How are you extracting it at the moment? Could you post your php code? – Johan Jul 02 '12 at 20:14
  • $ch = curl_init(); curl_setopt($ch, CURLOPT_URL, $url); // The url to get links from curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); // We want to get the respone $result = curl_exec($ch); $regex='| – way2project Jul 02 '12 at 20:16
  • perhaps they are trying to stop you steal their property –  Jul 02 '12 at 20:19
  • Please post the code in the question itself. You can edit your question. –  Jul 02 '12 at 20:19

3 Answers3

2

I prefer using PHP's DOMDocument going through HTML, versus preg_match. Something like this should work:

$xpath = new DOMXPath($result);
$links = $xpath->query('//link[@rel="image_src"]');
foreach ($links as $link) {
     $src = $link->nodeValue;
}
Mark Roach
  • 1,039
  • 9
  • 20
2

Here's another alternative that helped me. It's similar to the DOMXPATH suggestion by @Mark Roach

$dom = new DOMDocument;
$dom->loadHTML($html);
$nodes = $dom->getElementsByTagName('link');
foreach ($nodes as $node){
    if ($node->getAttribute('rel') === 'image_src')
    {
        echo($node->getAttribute('href'));
    }
}
socca1157
  • 365
  • 3
  • 17
0

Like so

    <?php
    $url = 'http://www.scribd.com/doc/15490455/Learning-PHP-5';
    $ch = curl_init();
    curl_setopt($ch, CURLOPT_URL, $url);    // The url to get links from
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); // We want to get the respone
    $result = curl_exec($ch);

    $regex='#.*link rel=\"image_src\" href=\"(.*)\"./>#';
    preg_match($regex,$result,$parts);

    foreach ($parts as $part) {
       echo = $part;
    }
    ?>
Danny
  • 66
  • 3
  • Also, you might want to check if 'curl' is installed on your host. Mark your solution doesn't seem to work. Does xpath require the html to be valid to work, or am I missing something – Danny Jul 02 '12 at 23:36
  • The scribd.com site in your example is XHTML, and the tree is in the "http://www.w3.org/1999/xhtml" namespace. Mark's solution doesn't work for that site, because the XPath query is assuming the default namespace. – Joe Liversedge Feb 05 '13 at 16:59