3

I have a set of html items which are to be parsed. I need to parse the contents of a div whose class name ends with 'uid-g-uid'. Below are the sample divs...

<div class="uid-g-uid">1121</div>

<div class="yskisghuid-g-uid">14234</div>

<div class="kif893jduid-g-uid">114235</div>

I have tried the below combinations but didnt work

$doc = new DOMDocument();
$bdy = 'HTML Content goes here...';
@$doc->loadHTML($bdy);
$xpath = new DomXpath($doc);
$div = $xpath->query('//*[@class=ends-with(., "uid-g-uid")]');

and also tried

$doc = new DOMDocument();
$bdy = 'HTML Content goes here...';
@$doc->loadHTML($bdy);
$xpath = new DomXpath($doc);
$div = $xpath->query('//*[@class="*uid-g-uid"]');

Please help!

Álvaro González
  • 142,137
  • 41
  • 261
  • 360
Guns
  • 2,678
  • 2
  • 23
  • 51

4 Answers4

3

ends-with() requires Xpath 2.0 so it won't work with DOMXPath which is Xpath 1.0. Something like this should work though:

$xpath->query('//*["uid-g-uid" = substring(@class, string-length(@class) - 8)]');
Damien Legros
  • 519
  • 3
  • 7
2

You want to do an XPath 1.0 query that checks for a string that ends with a certain string. The ends-with() string function is not available in that version.

I can see multiple ways to do this. As in your case the substring always is in there only once and if then at the end you can just use contains():

//*[contains(@class, "uid-g-uid")]

If the substring could be also at some other place in there and you dislike it, then check if it is at the end:

//*[contains(@class, "uid-g-uid") and substring-after(@class, "uid-g-uid") = ""]

If it could be even in there multiple times, then this won't work neither. In that case you can just check if the string ends wiht it:

//@class[substring(., string-length(.) - 8, 9) = "uid-g-uid"]/..

Which is probably the most straight-forward variant even, or, as the third argument of substring() is optional to compare until the end:

//@class[substring(., string-length(.) - 8) = "uid-g-uid"]/..
M8R-1jmw5r
  • 4,896
  • 2
  • 18
  • 26
2

Since you're looking for a XPath function that is not available in XPath 1.0, I think you can go with DOMXPath::registerPhpFunctions feature provided by PHP to call any PHP function for your XPath query. With that you can even call preg_match function like this:

$html = <<< EOF
<div class="uid-g-uid">1121</div>
<div class="yskisghuid-g-uid">14234</div>
<div class="kif893jduid-g-uid">114235</div>
EOF;
$doc = new DOMDocument();
libxml_use_internal_errors(true);
$doc->loadHTML($html); // loads your html
$xpath = new DOMXPath($doc);

// Register the php: namespace (required)
$xpath->registerNamespace("php", "http://php.net/xpath");

// Register PHP preg_match function
$xpath->registerPHPFunctions('preg_match');

// call PHP preg_match function on your xpath to make sure class ends
// with the string "uid-g-uid" using regex "/uid-g-uid$/"
$nlist = $xpath->evaluate('//div[php:functionString("preg_match",
                           "/uid-g-uid$/", @class) = 1]/text()');

$numnodes = $nlist->length; // no of divs matched
for($i=0; $i < $numnodes; $i++) { // run the loop on matched divs
   $node = $nlist->item($i);
   echo "val: " . $node->nodeValue . "\n";
}
anubhava
  • 761,203
  • 64
  • 569
  • 643
1

try this:

#/ First regex and replace your class with findable flag
$bdy = preg_replace('/class=\".*?uid-g-uid\"/ims', 'class="__FINDME__"', $bdy);

#/ Now find the new flag name instead
$dom = new DOMDocument();
@$dom->loadHTML($bdy);
$xpath = new DOMXPath($dom);

$divs = $xpath->evaluate("//div[@class = '__FINDME__']");
var_dump($divs->length); die(); //check if length is >=1. else we have issue.

for($j=0; $j<$divs->length; $j++)
{
    $div = $divs->item($j);
    $div_value = $div->nodeValue;
    .  
    .  
    .  
}
Raheel Hasan
  • 5,753
  • 4
  • 39
  • 70
  • I am getting an error DOMXPath::evaluate(): xmlXPathCompOpEval: function ends-with not found – Guns Apr 09 '13 at 12:24
  • This is the basic scheme how you will exactly get it. You just need to now get the right class search in it. – Raheel Hasan Apr 09 '13 at 12:32
  • 1
    Check the updated version. I have done it with a small regex trick – Raheel Hasan Apr 09 '13 at 12:47
  • What IF original div already have this string `__FINDME__` in class attribute. You cannot always manually examine the input text in advance to figure out what string should be used for this pre-processing. – anubhava Apr 09 '13 at 13:31
  • 1
    now who would have `__FINDME__` as the class !! and if they do, change it to `____FINDME____` (or something more complex) ! – Raheel Hasan Apr 09 '13 at 13:34
  • @RaheelHasan: That means a coder needs to know HTML in advance? Will it always be feasible? – anubhava Apr 09 '13 at 14:11
  • 1
    @RaheelHasan That is very true! and infact, __FINDME__ itself is complex and could not be a class at any point of time and a good HTML programmer would not make a mistake of using such classes anyways!! – Guns Apr 09 '13 at 16:51