1

I have some trouble using TFHpple, so here it is : I would like to parse the following lines :

<div class=\"head\" style=\"height: 69.89px; line-height: 69.89px;\">
    <div class=\"cell editable\" style=\"width: 135px;\"contenteditable=\"true\">
        <p>&nbsp;1</p>
    </div>
    <div class=\"cell\" style=\"width: 135px;\" contenteditable=\"false\">
        <p>2</p>
    </div>
</div>

<div style=\"height: 69.89px; line-height: 69.89px;\" class=\"head\">
    <div class=\"cell\" style=\"width: 135px; text-align: left;\"contenteditable=\"false\">
        <p>3&nbsp;</p>
    </div>
    <div class=\"cell\" style=\"width: 135px;\" contenteditable=\"false\">
        <p>4</p>
    </div>
</div>

<div style=\"height: 69.89px; line-height: 69.89px;\" class=\"\">
    <div class=\"cell\" style=\"width: 135px;\" contenteditable=\"false\">
        <p>5</p>
    </div>
    <div class=\"cell\" style=\"width: 135px;\" contenteditable=\"false\">
        <p>6</p>
    </div>
</div>

For now I would like to put the first level of div "element" (sorry I don't know the proper terminology) in an array. So I have tried to do it by simply giving /div as the xPath to the searchWithXPathQuery methods but it simply doesn't find anything.

My second solution was to try using a path of this kind : //div[@class=\"head\"] but also allowing [@class=\"\"] but I don't even know if it is possible. (I would like to do so because I need the elements to be in the same order in the array as they are in the data)

So here is my question, is there a particular reason why TFHpple wouldn't work with /div? And if there is noway to just take the first level of div, then is it possible to make a predicate on the value of an attribute with xPath (here the attribute class) ? (And how ? I have looked quite a lot now and couldn't find anything)

Thanks for your help.

PS : If it helps, here is the code I use to try and parse the data, it is first contained in the string self.material.Text :

NSData * data = [self.material.Text dataUsingEncoding:NSUnicodeStringEncoding];
TFHpple * tableParser = [TFHpple hppleWithHTMLData:data];
NSString * firstXPath = @"/div";
NSArray<TFHppleElement *> * tableHeader = [tableParser searchWithXPathQuery:firstXPath];
NSLog(@"We found : %d", tableHeader.count);
Hugues Duvillier
  • 487
  • 6
  • 12

2 Answers2

0

You can use the following XPath expression to get div element -that's quite a correct term-, having class attribute value equals "head" or empty :

//div[@ciass='head' or @class='']
har07
  • 88,338
  • 12
  • 84
  • 137
  • Thank for your answer, it does solve part of my problem, but the fact is that sometimes the `class` attribute is missing when it would be empty (I can't fix that, I fetch the files from a web platform), is there a way I can still take the element ? (something equivalent to `/div` maybe ?) – Hugues Duvillier Sep 16 '15 at 08:53
  • @HuguesDuvillier, This solution will also work if it is empty. In XPath 1.0, the string value of an empty node is an empty string (this changed slightly in XPath 2.0). If you also want to check for spaces-only, use `normalize-string(@class)`. – Abel Sep 16 '15 at 09:19
0

You wrote:

Getting first level using TFHpple

I assume you mean: without also getting all descendants?

Taking your other requirements into account, you can do so as follows:

//div[not(ancestor::div)][@class='head' or @class='']

Dissecting this:

  • Select all div elements (yes, correct term ;) at any level in the whole document: //div
  • Filter with a predicate (the thing between brackets) for elements not containing a div themselves, by checking if there's some div ancestor (parent of a parent of a parent of a....) [not(ancestor::div)]
  • Filter by your other requirements: [@class='head' or @class='']

Note 1: your given XML is not valid, it contains multiple root elements. XML can have at most one root element.

Note 2: if your requirements are to first get all divs by @class or empty @class, and then only those that are "first level", reverse the predicates:

//div[@class='head' or @class=''][not(ancestor::div)]
Abel
  • 56,041
  • 24
  • 146
  • 247