2

Possible Duplicate:
Grabbing the href attribute of an A element

I need to parse all links of an HTML document that contain some word (it's always different).

Example:

<a href="/bla:bla">BLA</a>
<a href="/link:link">BLA</a>
<a href="/link:bla">BLA</a>

I only need the links with "href=/link: ...." what's the best way to go for it?

$html = "SOME HTLM ";
$dom = new DomDocument();
@$dom->loadHTML($html);
$urls = $dom->getElementsByTagName('a');
foreach ($urls as $url)
{
    echo "<br> {$url->getAttribute('href')} , {$url->getAttribute('title')}";
    echo "<hr><br>";
}

In this example all links are shown, I need specific links.

Community
  • 1
  • 1
Ron
  • 25
  • 1
  • 3

4 Answers4

5

By using a condition.

<?php 
$lookfor='/link:';

foreach ($urls as $url){
    if(substr($url->getAttribute('href'),0,strlen($lookfor))==$lookfor){
        echo "<br> ".$url->getAttribute('href')." , ".$url->getAttribute('title');
        echo "<hr><br>";
    }
}
?>
Lawrence Cherone
  • 46,049
  • 7
  • 62
  • 106
3

Instead of first fetching all the a elements and then filtering out the ones you need you can query your document for those nodes directly by using XPath:

//a[contains(@href, "link:")]

This query will find all a elements in the document which contain the string link: in the href attribute.

To check whether the href attribute starts with link: you can do

//a[starts-with(@href, "link:")]

Full example (demo):

$dom = new DomDocument();
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
foreach ($xpath->query('//a[contains(@href, "link:")]') as $a) {
    echo $a->getAttribute('href'), PHP_EOL;
}

Please also see

for related questions.

Note: marking this CW because of the many related questions

Community
  • 1
  • 1
Gordon
  • 312,688
  • 75
  • 539
  • 559
0

Use regular expressions.

foreach ($urls as $url)
{
    $href = $url->getAttribute('href');
    if (preg_match("/^\/link:/",$href){
        $links[$url->getAttribute('title')] = $href;
    }
}

$links array contains all of the titles and href's that match.

Mario Lurig
  • 783
  • 1
  • 6
  • 16
0

As getAttribute simply returns a string you only need to check what it starts with with strpos().

$href = $url -> getAttrubute ('href');
if (strpos ($href, '/link:') === 0)
{
    // Do your processing here
}
GordonM
  • 31,179
  • 15
  • 87
  • 129