How to get a link with specific content using simple html dom

Question

I want to get "/contact/new" from <a href="/contact/new">Contact us</a>. The condition would be like that if a link has 'Contact' or 'Contact us' text then get the href value. There will be no class.

How can I do this?

granch · Answer 1 · 2017-07-15T05:25:58.273

0

Using regex and PHP:

$text = '<a href="/contact/new">Contact us</a>';

preg_match_all('(<a href="([^"]*)">[Contact us|Contact]*</a>)', $text, $matches);
foreach ($matches[1] as $href) {
    // Do whatever you want with the href attribute
    echo $href;
}

Using jQuery:

Select all a elements, check if their html() is the text that you are looking for return attr.("href")

$("a").each(function(index, element) {
    if ($(elem).html() == "Contact" || $(elem).html() == "Contact us") {

        // Do whatever you want with the href attribute
        console.log($(elem).attr("href"));

    }
});

edited Jul 15 '17 at 05:25

answered Jul 15 '17 at 04:38

granch

225
3
12

You mean with PHP parsing the `a` tag like a string? – granch Jul 15 '17 at 04:43
Yes, Parsing all `a` tags. – prokawsar Jul 15 '17 at 04:45
jQuery has `:contains` so you could just use that. Also your regex has some major issues. – pguardiario Jul 15 '17 at 22:34
@pguardiario I understood that the selection have to be for `a` tags with literal texts `Contact` and `Contact us` . That why I use `html()`. I've been testing the regex but really don't know what major issue are you talking about. – granch Jul 17 '17 at 22:29
I suggest you post a "what's wrong with this regex" question. You'll get lots of good feedback that way. More than I can give in this comment. – pguardiario Jul 17 '17 at 22:59

score 0 · Answer 2 · answered Jul 15 '17 at 05:02

0

I have solved by this piece of code. Obviously after getting approach from @Matias Cerrotta

foreach($dom->find('a') as $element) { echo $element->plaintext . '<br>'; }

answered Jul 15 '17 at 05:02

prokawsar

150
1
15

score 0 · Answer 3 · answered Jul 15 '17 at 05:36

This can be accomplished using SimpleXML and XPath.

You will need to adjust how you load the page in to SimpleXML using file_get_contents or some other method to read the page to a variable and then pass it through.

I have created a mock up that works below

<?php
$html = '
<a href="/contact/new">Contact us</a>
';

//Replace with your loading logic here
$xml = simplexml_load_string($html);

//Perform the search
$search = $xml->xpath('//a[contains(text(), "Contact us") or contains(text(), "Contact")]');

//Check the results have at least one value
if(count($search) !== 0 && $search !== false)
{
    //Get first item
    $item = $search[0];

    //Get item attributes
    $attributes = $item->attributes();

    //Output the HREF attribute (need an existence check here (isset))
    echo $attributes['href'];
}

The XPath method returns an array of the matches which will need to be filtered through if more than one result is returned, in the sample I am grabbing the first one and outputting the href attribute of the node.

The search finds all a tags regardless of position in the string/document and checks that it contains either "Contact us" or "Contact".

Note: XPath is case sensitive and whilst there are ways to make it insensitive you will need to implement this yourself or write more conditions to check for.

If you need case insensitivity then check another Stack question, it has been covered before:

E.g: case insensitive xpath searching in php

How to get a link with specific content using simple html dom

3 Answers3