0

Is it possible to get only all the text content of the child elements recursively in hpple. Any method in TFHppleElement class? such as the javascript

document.getElementById("testdiv").textContent
karim
  • 15,408
  • 7
  • 58
  • 96

2 Answers2

1

I'm using this code to get all content of the news title

NSURL *newURL = [NSURL URLWithString:@"http://somesite"];
        NSData *newsData = [NSData dataWithContentsOfURL: newURL];

        TFHpple *newsParser = [TFHpple hppleWithHTMLData: newsData];

        NSString *newsXpathQueryString = @"//div[@class='item column-1']";
        NSArray *newsNodes = [newsParser searchWithXPathQuery: newsXpathQueryString];

        NSMutableArray *newNews = [[NSMutableArray alloc] initWithCapacity: 0];

        for (TFHppleElement *element in newsNodes)
        {
            News *news = [[News alloc] init];

            [newNews addObject: news];

            news.title = [[element content] stringByTrimmingCharactersInSet:[NSCharacterSet whitespaceAndNewlineCharacterSet]];

            news.photo_url = [element objectForKey:@"src"];

            _allNews = newNews;
            [self.tableView reloadData];
        }
    }

you can use

news.title = [[element firstChild]content] to get children elements content
SergStav
  • 750
  • 5
  • 19
  • Thanks. I wanted something like below, I have answered. I know about the firstChild, but I wanted all the text contents of the children, like 'textContent' return. The contents are in div and span. – karim Apr 10 '15 at 07:51
0

I wanted something like this - a quick boiler plate code, it is not an elegant solution with static contents. Please let me know, how can this be improved :)

#pragma mark - Hpple XML parser

/* The documents contents lots of nested div, table, span, style etc. */
- (NSString *) extractDefinition
{
    NSString *html = [self.webView stringByEvaluatingJavaScriptFromString: @"document.getElementById('innerframe').innerHTML"];
    if ([Resources stringIsEmpty:html]) {
        return nil;
    }

    return [self extractSubDiv:html];
}

- (NSString *)extractSubDiv:(NSString *)html
{
    TFHpple *hppleParser = [TFHpple hppleWithHTMLData:[html dataUsingEncoding:NSUTF8StringEncoding]];

    NSString * xpathQuery;
    xpathQuery = @"//div[@id='columnboth']";
    NSArray * defNodes = [hppleParser searchWithXPathQuery:xpathQuery];
    NSString * text = nil;
    if ([defNodes count] > 0) {
        TFHppleElement * element = [defNodes objectAtIndex:0];
        text = [self parseContents:element];
    } else {
        xpathQuery = @"//div[@id='columnsingle']";
        defNodes = [hppleParser searchWithXPathQuery:xpathQuery];
        if ([defNodes count] > 0) {
            TFHppleElement * element = [defNodes objectAtIndex:0];
            text = [self parseContents:element];
        }
    }
    return text;
}

- (NSString *) parseContents:(TFHppleElement *)element {
    NSArray * innhold = [element searchWithXPathQuery:@"//div[contains(@class,'articlecontents')]"];
    return [self getTextFromArray:innhold];
}


static NSMutableString * contents;

- (NSString *) getTextFromArray:(NSArray *)hppleElments {
    NSMutableString * text = [[NSMutableString new] autorelease];
    contents = nil;
    contents = [[NSMutableString new] autorelease];
    for (TFHppleElement * e in hppleElments) {
        [text appendFormat:@"%@ ", [self getText:e]];
    }
    return text;
}

/* Here are more nested div and then span for text. */
- (NSString *) getText:(TFHppleElement *)element
{
    if ([element isTextNode]) {
        [contents appendFormat:@" %@", element.content];
    }

    for (TFHppleElement * e in element.children) {
        [self getText:e];
    }

    return contents;
}
karim
  • 15,408
  • 7
  • 58
  • 96