0

I am trying to parse the below link using hpple:

http://www.decanter.com/news/wine-news/529748/mimimum-pricing-opponents-slam-cameron-speech

Code:

- (void)parseURL:(NSURL *)url {
    NSData *htmlData = [NSData dataWithContentsOfURL:url];    
    TFHpple *xpathParser = [[TFHpple alloc] initWithHTMLData:htmlData];
    NSArray *elements  = [xpathParser searchWithXPathQuery:@"<div class=\"body\" id=\"article-529748-body\">"];
    NSLog(@"elements %@",elements);
    TFHppleElement *element = [elements objectAtIndex:0];
    NSString *myTitle = [element content];
    [xpathParser release];
}

but it is crashing. Crash Report:

XPath error : Invalid expression
<div class="body" id="article-529748-body">
^
XPath error : Invalid expression
<div class="body" id="article-529748-body">
^

How to solve this issue? why my elements array is empty? Am I parsing in a wrong way? I want to get the information available in that div tag.

Dee
  • 1,887
  • 19
  • 47

3 Answers3

0

Try changing this:

NSArray *elements  = [xpathParser searchWithXPathQuery:@"<div class=\"body\" id=\"article-529748-body\">"];

To:

NSArray *elements  = [xpathParser searchWithXPathQuery:@"//div [@class='body'] [@id=\'article-529748-body\']"];
JamMySon
  • 37
  • 2
  • 11
  • This was how it's done on Ray Wenderlich's tutorial on how to parse HTML on iOS. I also know that Obj - C sometimes flips out when you forget to put a @ in front of a string. – JamMySon Aug 25 '14 at 20:31
0

Writing this (2 years later!) in case it's useful to someone else with a similar problem.

In order to parse the html within the div, you need to

  1. use syntax similar (single-quotes don't need to be escaped) to that quoted by JamMySon on this page
  2. remember that [element content] only gives you the content( if any) for that node , NOT its children.

Because of this you may need to use recursion to walk though the div's node-tree.

Code (ARC):

- (void) decanterHpple{
    NSURL *url = [NSURL URLWithString:@"http://www.decanter.com/news/wine-news/529748/mimimum-pricing-opponents-slam-cameron-speech"];
    NSData *htmlData = [NSData dataWithContentsOfURL:url];

    TFHpple *pageParser = [TFHpple hppleWithHTMLData:htmlData];

    NSString *queryString = @"//div[@id='article-529748-body']";//1.works with unescaped single-quotes(') AND 2.No need for class='' when using id=''
    NSArray *elements = [pageParser searchWithXPathQuery:queryString];

    //old code ~ slightly amended
    if([elements count]){
        TFHppleElement *element = [elements objectAtIndex:0];
        NSString *myTitle = [element content];
        NSLog(@"myTitle:%@",myTitle );
    }
    //new code
    NSString *theText = [self stringFromWalkThruNodes:elements];
    NSLog(@"theText:%@",theText );
}

using this recursive method:

- (NSString*) stringFromWalkThruNodes:(NSArray*) nodes {
    static int level = 0;//level is only useful for keeping track of recursion when stepping through with a breakpoint
    level++;//put breakpoint here...
    NSString *text = @"";
    for (TFHppleElement *element in nodes){
        if (element.content) {
            text = [text stringByAppendingString:element.content];
        }
        if (element.children) {
            NSString *innerText = [self stringFromWalkThruNodes:element.children];
            text = [text stringByAppendingString:innerText];
        }
    }
    level--;
    return text;
}

This gives the output:

2014-10-22 19:44:07.996 Decanted[10148:a0b] myTitle:(null)

2014-10-22 19:44:07.997 Decanted[10148:a0b] theText:

On a visit to a hospital in north-east England, Mr Cameron is to call for the drinks industry to do more to tackle a problem which costs the National Health Service £2.7bn a year.A ban on the sale of alcohol below cost price - less than the tax paid on it - is set to be introduced in England and Wales from 6 April, but ministers are expected to push for a higher minimum price for drink.Opponents of a minimum unit price say it is unfair because it penalises all drinkers, not just binge or problem drinkers.Responding to the Prime Minister’s comments, Wine and Spirit Trade Association spokesman Gavin Partington reiterated the drinks indusry’s commitment ‘to helping the Government tackle alcohol misuse, alongside other stakeholders.‘This is why we are working hard through the Public Health Responsibility Deal on a range of initiatives to promote responsible drinking.’These initiatives, Partington said, include the expansion of Community Alcohol Partnerships across the UK and a national campaign by retailers to raise consumer awareness about the units of alcohol in alcoholic drinks.Partington said, ‘Unlike these measures, minimum unit pricing is a blunt tool which would both fail to address the problem of alcohol misuse and punish the vast majority of responsible consumers. As Government ministers acknowledge, it is also probably illegal'.Decanter is also against the scheme, calling it ‘fundamentally flawed.’‘The real problem,’ editor Guy Woodward has said, ‘lies with supermarkets who use wine as a loss-leader, slashing margins, bullying suppliers and dragging down prices in order to attract customers…Selling wine at a loss helps neither consumers nor the trade.’Other opponents of the scheme include the British Beer and Pub Association, which told the BBC there was ‘a danger it would be done through higher taxation, which would be hugely damaging to pub-goers, community pubs and brewers, costing thousands of vital jobs.’It is thought any move toward minimum pricing could also be illegal under European competition law, which is aimed at pushing down prices for consumers and allowing firms to operate in a free market.

PS. Only started playing with Hpple this p.m. after reading the aforementioned Wenderlich tutorial; I'm sure someone more experienced may come up with a more elegant solution!

cate
  • 600
  • 7
  • 15
0

Check that your elements array is not empty

- (void)parseURL:(NSURL *)url {
NSData *htmlData = [NSData dataWithContentsOfURL:url];    
TFHpple *xpathParser = [[TFHpple alloc] initWithHTMLData:htmlData];
NSArray *elements  = [xpathParser searchWithXPathQuery:@"<div class=\"body\" id=\"article-529748-body\">"];
NSLog(@"elements %@",elements);
if([elements count]){
    TFHppleElement *element = [elements objectAtIndex:0];
}
NSString *myTitle = [element content];
[xpathParser release];
}
rakeshNS
  • 4,227
  • 4
  • 28
  • 42
  • @Dee Try to parse a simple HTML file like "http://www.gnu.org/licenses/gpl.html" using same code. If you can parse it then the problem will be with hpple. Unlike XML, HTML is not as much easy to parse. – rakeshNS Feb 18 '12 at 12:41