1

So I'm using HPPLE to do some Xpath queries in a iOS app that needs to do some basic web scraping of a few sites. Right now everything works pretty good, but I wanted to see if there's another, more elegant way of doing what I'm doing. Currently what I'm doing is that I'm using XPath to find a specific div class in a website, within that website (which is basically like a post) there can be any number of children that have text, and others that have text buried in another set of children. Right now I'm basically using repeated For Loops to check if the "text" tagName exists and if so add that value to a string and if not then check if there is another level of children that need to be scanned and I have 4 levels so far of the same search. I was wondering if there is some method that I can run that will redo the same search if the count of children within the current level is greater than 0. Below is the code for how I'm doing this now

 for (TFHppleElement *element in searchNodes) {
    //If a Text Node is found add it to the String, if not search again with next layer
    if ([element.tagName isEqualToString:@"text"]) {
        [bigString appendString:element.content];
    }
    //1. First layer Scan
    if (element.children.count > 0) {
        for (TFHppleElement *nextStep in element.children) { 
            if ([nextStep.tagName isEqualToString:@"text"]) {
                [bigString appendString:nextStep.content];
            }
            
            //2. Second layer Scan
            if (nextStep.children.count > 0) {
                for (TFHppleElement *child in nextStep.children) { 
                    if ([child.tagName isEqualToString:@"text"]) {
                        [bigString appendString:child.content];
                        
                    }
                    
                    //3. Thrid Layer Scan
                    if (child.children.count > 0) {
                        for (TFHppleElement *children in child.children) { 
                            if ([children.tagName isEqualToString:@"text"]){
                                [bigString appendString:children.content];
                            }
                            
                            //4. Fourth Layer Scan
                            if (children.children.count > 0) {
                                for (TFHppleElement *newchild in children.children){
                                    if ([newchild.tagName isEqualToString:@"text"]) {
                                        [bigString appendString:newchild.content];
                                        
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}

I would like to have some kind of method build that I can just basically send over the initial NSArray and then it's checks for additional elements and then performs the search again with the next array all while continuing to build up a NSMutableString that will end up with all the text from every search. If not, what I have now seems to be working fine, I just wanted to see if there was a cleaner way of doing this.

Sal Aldana
  • 1,235
  • 4
  • 17
  • 39

1 Answers1

0

I think what you want is recursion. You can write a recursive method that you pass an element to, have it modify some NSMutableString outside itself (an instance variable, maybe?), then call itself with its own children if it can. For example (uncompiled, untested):

@property (nonatomic, retain) NSMutableString * bigString;
// snip
@synthesize bigString;
// snip - assume bigString gets initialized somewhere

- (void)checkElement:(TFHppleElement *)elem {
    if ([element.tagName isEqualToString:@"text"]) {
        [bigString appendString:elem.content];
    }

    if (element.children.count > 0) {
        for (TFHppleElement * child in element.children) {
            [self checkElement:child];
        }
    }
}
Tim
  • 59,527
  • 19
  • 156
  • 165
  • Thanks Tim, I'll give that a shot. I was actually trying something similar but was having trouble with the mutablestring as it was starting to get null values and crashing. If this works out I'll mark this as answered. – Sal Aldana Jul 12 '12 at 17:15
  • Hey Tim, this pretty much worked as is. My initial problem with it was that I was initializing the NSMutableString within the method and that was giving me some issues, moving the string outside of the method seems to have cleared up the mess that I made. Thanks. – Sal Aldana Jul 13 '12 at 16:15
  • Yeah, you'll have to keep initialization separate - otherwise you may wind up overwriting the string every time you recurse. A pattern I see a lot of people follow is to have one method responsible for initialization (like of your string) and the *first* recursive call to a second method, maybe called `checkElementRecursive:`, that calls itself repeatedly. That way you expose a consistent API (only the `checkElement:` method), keep your initialization in one place, and still have the benefits of recursion. – Tim Jul 13 '12 at 16:30