0

I have these 2 queries:


1) Get all the URL of the images:

$imgs = $xpath->query('//div[@class="pin"]/div[@class="PinHolder"]/a/img');



2) Get how many people facebook-liked every image:

foreach($xpath->query('//span[@class="LikesCount"]') as $span) {
    $int = (int) $span->nodeValue;
    if ($int > 5) {
        echo $i++ . "--> " . $int . "<br />";
    }
}



I'd like to merge them to get just the images which has been facebook-liked more than 5 times That said, pictures that hasn't been liked don't have the LikesCount class at all.

Follow an example of the Markup:

<div class="pin">

[...]

<a href="/pin/56787645270909880/" class="PinImage ImgLink">
    <img src="http://media-cache-ec3.pinterest.com/upload/56787645270909880_d7AaHYHA_b.jpg" 
         alt="Krizia" 
         data-componenttype="MODAL_PIN" 
         class="PinImageImg" 
         style="height: 288px;">
</a>

<p class="stats colorless">
    <span class="LikesCount"> 
        22 likes 
    </span>
    <span class="RepinsCount">
        6 repins
    </span>
</p>

[...]

</div>
C. M. Sperberg-McQueen
  • 24,596
  • 5
  • 38
  • 65
Andrea Puiatti
  • 549
  • 4
  • 11
  • 1
    I have correctly formatted the question for you. Why are you constantly reverting it? – Madara's Ghost Dec 10 '12 at 22:07
  • sorry, first day on here.. thank you! – Andrea Puiatti Dec 10 '12 at 22:08
  • Please provide complete input XML (provided XML doesn't contain `div[@class="PinHolder"]` – Kirill Polishchuk Dec 10 '12 at 22:19
  • You'll need XPath 2.0, but something like `//div[@class="pin"]/p/span[@class="LikesCount"][substring-before(., " ") > 5]/ancestor::div[@class="pin"]/a/img` should work as a start. – cmbuckley Dec 10 '12 at 22:37
  • @Geo: Again, why are you reverting my formatting? I don't like repeating myself! – Madara's Ghost Dec 10 '12 at 22:52
  • Here the complete markup: http://pastebin.com/r3ZznXjF. @cbuckley thank you but how to display results? vardump($domobj) gives me object(DOMNodeList)#3 (0) { } – Andrea Puiatti Dec 11 '12 at 04:48
  • `foreach($domobj as $node){ var_dump($node->ownerDocument->saveXML($node);}`. But as I mention, you'll need XPath 2.0 to use it with `DOMXPath`. Have you thought about [`DOMXPath::registerPhpFunctions`](http://www.php.net/manual/en/domxpath.registerphpfunctions.php), as per [this question](http://stackoverflow.com/questions/8031377/using-regex-in-php-xpath-evaluate)? – cmbuckley Dec 11 '12 at 20:07

1 Answers1

1

To retrieve not all images, but only images with a likes-count of 5 or more, I'd try changing the XPath expression in the assignment to $imgs to read:

//div[@class="pin"]
     [.//span[@class = 'LikesCount']
             [substring-before(normalize-space(.),' ') > 5]]
     /div[@class="PinHolder"]
     /a/img

(I have added whitespace to make this a little easier to follow; you may need to eliminate the newlines if your XPath parser doesn't follow the spec in this matter [some don't]).

It's not clear to me why cbuckley says this will require XPath 2.0; perhaps he sees some subtle issue here that I don't.

C. M. Sperberg-McQueen
  • 24,596
  • 5
  • 38
  • 65
  • thank you McQueer, but I still don't understand how to print/treat the output of xpath. Is it an array, a single string or a DOM Element? what the hell I'm freaking out :D and with this query --> //span[@class="LikesCount"][normalize-space(.)] <-- it doesn't normalize anything I'm so frustrated – Andrea Puiatti Dec 15 '12 at 21:15
  • 1
    From an XPath point of view, the XPath expression I gave evaluates to a set of nodes in an XML document. What representation a given programming language and library use for them depends on them, not on XPath. The [PHP documentation](http://www.php.net/manual/en/domxpath.query.php) says that DOMXPath::query returns a value of type `DOMNodelist`; if you know how to work with the results of your existing queries, you should know how to work with the result of the reformulated query. (If you don't, you seem to have a question about PHP's DOM library, not about XPath.) – C. M. Sperberg-McQueen Dec 15 '12 at 23:44
  • 1
    The expression `//span[@class='LikesCount']` matches every `span` element with a `class` attribute whose value is "`LikesCount`". Adding the predicate `[normalize-space(.)]` further restricts the result to those `span` elements for which the expression `normalize-space(.)` evaluates to true. The predicate thus has the effect of filtering out any `span` elements which are empty or which contain only whitespace. N.B. The value returned by `normalize-space(.)` is used to evaluate the predicate and then discarded: it will have no effect whatever on the DOM nodes returned. – C. M. Sperberg-McQueen Dec 15 '12 at 23:51
  • discarded??? That's why I was freaking out, I didn't get that! So If I want to evaluate the "normalized" value I must 1) evaluate it in the xpath query OR 2) evaluate it in php stripping the white spaces manually again! Thank you McQueen – Andrea Puiatti Dec 17 '12 at 10:22
  • 1
    It's certainly possible to write an XPath expression that evaluates to the normalized value (and thus returns the normalized value to the caller), but not while wrapping it inside a predicate. If this feels tricky, you might benefit from a good introduction to XPath. – C. M. Sperberg-McQueen Dec 17 '12 at 14:58