using xpath to extract data by introducing ancestor in xpath query

Question

i am using following code

$doc = new DOMDocument();
$doc->strictErrorChecking = false;
@$doc->loadHTML($data);
  $xpath = new DOMXPath($doc);
 //Select the parent node
$categories =$xpath->query('//span[@class="refinementLink"]/ancestor::a/li/ul');
$abcd=array();
var_dump($categories);
foreach ($categories as $category) {


    $abcd[]=$category->nodeValue; 
      print_r('<br/>'.$abcd);
    // Crafts, Hobbies & Home (19)
}` //var_dump($abcd);

now , what this code do? it selects a span tag, dom listing of span tag is

ul--li(4)--a(2)--span(3)

the output is

object(DOMNodeList)[3]

it looks like i am doing thing okay, there are 3 span tags in my html document, what i need is , how i can get the text of these span tag?i need the text between the span tags any help?

Wrikken · Accepted Answer · 2012-01-26T23:52:38.103

1

->textContent

foreach ($categories as $category) {
    $abcd[]=$category->textContent; 
}
var_dump($abcd);

edited Jan 26 '12 at 23:52

answered Jan 26 '12 at 23:44

Wrikken

69,272
8
97
136

sorry unable to understand, looks like, i have to memorize whole list while solving this problem – Zaffar Saffee Jan 26 '12 at 23:50
That is the property you want... I was a bit lazy, I'll add in a bit ;) No need to memorize lists (I don't either), but reading through all the documentation of the object you have at hand if you want a property works well with the excellent php-documentation. – Wrikken Jan 26 '12 at 23:51
$abcd[]=$category->nodeValue->textcontent; and $abcd[]=$category->textcontent; i tried these replavedment, but still the same output – Zaffar Saffee Jan 26 '12 at 23:55
BTW: `object(DOMNodeList)[3]` does not mean there are 3 elements in it afaik, what does `var_dump($categories->length);` say? – Wrikken Jan 27 '12 at 00:01
int 0 is retuned by using var_dump($categories->length); – Zaffar Saffee Jan 27 '12 at 00:04
$category->textContent using C caps, still same output is given in question – Zaffar Saffee Jan 27 '12 at 00:04
NO chat for me, Im offline in 1 minute: suffice to say: your Xpath doesn't work: it doesn't find the nodes. Return to that: until you have a positive number here there are no matches (examine the HTML carefully). After that, you can use the `textContent`. – Wrikken Jan 27 '12 at 00:08
1

BTW: look at the output of `$doc->saveHTML()` to see exactly what DOMDocument loaded, sometimes it differs from the actual input... – Wrikken Jan 27 '12 at 00:10
thanks for the help...i reached at this point by working on http://stackoverflow.com/questions/9024649/extracting-node-values-using-xpath – Zaffar Saffee Jan 27 '12 at 00:11
if you can help me out, it would be nice of yuo – Zaffar Saffee Jan 27 '12 at 00:11
gr8 comment ,,$doc->saveHTML() told me where i was stuck – Zaffar Saffee Jan 27 '12 at 23:32

ttback · Answer 2 · 2012-01-27T18:57:37.170

I'm thinking you can probably pull the @attribute at the start when you do the XPath query. Predicates in XPath handle the foreach for you.

I use XML developer from Oxygen IDE, which works pretty well to show what XPath parses out of XML so you can be more certain about what to expect.

//span/@text[../@class="refinementLink"]/ancestor::a/li/ul I am not sure if text is your target text's attribute but in XPath, whatever right before [] is about what you want to select. You chose it to be a node, so you had to do additional work there. If you pull out a sequence of Strings instead, you might get something else. I never tried it myself, just offering an alternative thought.

using xpath to extract data by introducing ancestor in xpath query

2 Answers2