1

I am trying to read all links with in a given url.

here is code I am using :

$dom = new DomDocument();
@$dom->loadHTMLFile($url);
$urls = $dom->getElementsByTagName('a');

foreach ($urls as $url) {
        echo $url->innertext ." => ".$url->getAttribute('href');

Script giving all links of given url.

But problem here is I am not able to get image links (image inside anchor tag)

First I tried with

$url->nodeValue

But it was giving anchor text having text values only.

I want to read both images and text links. I want output in below formmat.

Input :

<a href="link1.php">first link</a>
<a href="link2.php"> <img src="imageone.jpg"></a>

Current Output:

first link => link1.php
=>link2.php with warning (Undefined property: DOMElement::$innertext )

Required Output :

first link => link1.php
<img src="imageone.jpg">=>link2.php 
lonesomeday
  • 233,373
  • 50
  • 316
  • 318
  • 1
    What's the output you're getting? What output do you want? – lonesomeday Sep 12 '11 at 10:56
  • getting only href values. and for text/image b/w anchor tag giving warning "Undefined property: DOMElement::$innertext in /home/url/public_html/crawl2.php" –  Sep 12 '11 at 11:00
  • @Alfred that doesnt help to clarify your question. Please provide a sample markup and some output you want to fetch from it. As for innerText: there is no such property in a DOMNode or DOMElement. – Gordon Sep 12 '11 at 11:02
  • 1
    @Gordon: Thanks. Now updated with current and required output. Please see updated question –  Sep 12 '11 at 11:08
  • possible duplicate of [innerHTML in PHP's DomDocument?](http://stackoverflow.com/questions/2087103/innerhtml-in-phps-domdocument) – Gordon Sep 12 '11 at 11:10

1 Answers1

1

innerText doesn't exist in PHP; it's a non-standard, Javascript extension to the DOM.

I think what you want is effectively an innerHTML property. There isn't a native way of achieving this. You can use the saveXML or, from PHP 5.3.6, saveHTML methods to export the HTML of each of the child nodes:

function innerHTML($node) {
    $ret = '';
    foreach ($node->childNodes as $node) {
        $ret .= $node->ownerDocument->saveHTML($node);
    }
    return $ret;
}

Note that you'll need to use saveXML before PHP 5.3.6

You could then call it as so:

echo innerHTML($url) ." => ".$url->getAttribute('href');
lonesomeday
  • 233,373
  • 50
  • 316
  • 318
  • Can you please update your answer according to my code.Actully I am not getting exactly. –  Sep 12 '11 at 11:14
  • giving error "DOMDocument::saveHTML() expects exactly 0 parameters, 1 given". –  Sep 12 '11 at 11:20
  • @Alfred See the note in my answer. You're clearly using an older version of PHP. Changing `saveHTML` to `saveXML` should make it work OK. – lonesomeday Sep 12 '11 at 11:21