0

I'm trying to use TFHpple to scrape webpages , but I don't know the syntax for

searchWithXPathQuery

I need to get the Title, description , and a list of images from an arbitrary webpage,

The code currently is similar to

NSData  * data      =     [NSData dataWithContentsOfURL:[NSURL URLWithString:@"http://www.google.com"]];

TFHpple * doc       = [[TFHpple alloc] initWithHTMLData:data];

NSArray *arr =    [doc searchWithXPathQuery:@"//title"];
TFHppleElement *titleElem = [arr firstObject];
NSString *titleStr = titleElem.text;
NSLog(@"arr = %@",arr);

I would expect this to retrieve all nodes

arr = [doc searchWithXPathQuery:@"//"];

but it isn't.

I don't care changing the framework to another one.

What is the best strategy to do this?

Avba
  • 14,822
  • 20
  • 92
  • 192

1 Answers1

0

Check out https://github.com/nolanw/HTMLReader

    NSURL *sUrl = [NSURL URLWithString:@"http://www.apple.com/"];
    NSData *htmlData = [NSData dataWithContentsOfURL:sUrl];
    NSString *markUp = [NSString stringWithUTF8String:[htmlData bytes]];
    HTMLDocument *site = [HTMLDocument documentWithString:markUp];
    NSString *siteTitle = [site firstNodeMatchingSelector:@"title"].textContent;
JoshR604
  • 155
  • 1
  • 5