0

I have a problem with encoding when parsing web page with hpple in XCode.

- (void)loadTutorials {

NSURL *tutorialsUrl = [NSURL URLWithString:@"http://qrz.si/members/s55db/"];
NSData *tutorialsHtmlData = [NSData dataWithContentsOfURL:tutorialsUrl options:NSASCIIStringEncoding error:nil];


TFHpple *tutorialsParser = [TFHpple hppleWithHTMLData:tutorialsHtmlData];

NSString *tutorialsXpathQueryString = @"//td[@class='data']";
NSArray *tutorialsNodes = [tutorialsParsersearchWithXPathQuery:tutorialsXpathQueryString];


NSMutableArray *newTutorials = [[NSMutableArray alloc] initWithCapacity:0];
for (TFHppleElement *element in tutorialsNodes) {
    Tutorial *tutorial = [[Tutorial alloc] init];
    [newTutorials addObject:tutorial];


    for (TFHppleElement *child in element.children) {
        if ([child.tagName isEqualToString:@"img"]) {
           // NSLog([child objectForKey:@"src"]);
        } else if ([child.tagName isEqualToString:@"p"]) {
            //NSLog([[child firstChild] content]);
            tutorial.title = [[child firstChild] content];
        }
    }
}

_objects = newTutorials;
[self.tableView reloadData];
}

Page should be UTF-8 as the source points out, but I get wierd characters out.

How can I force change encoding of the data? Any help would be highly appreciated!

b4d
  • 143
  • 1
  • 9

2 Answers2

1
options:NSASCIIStringEncoding

is useless here, documentation points out that it's not the right way to go.

To set encoding, one must edit XPathQuery.m by Matt Gallagher, that I got in the same tutorial. Changes were visible, but nothing worked, as the site was clearly UTF-8 encoded.

The problems were server side and administrator offered me good old plain XML :)

b4d
  • 143
  • 1
  • 9
0

You are telling NSData object that the contents of the URL you are loading is ASCII not UTF8:

NSData *tutorialsHtmlData = [NSData dataWithContentsOfURL:tutorialsUrl options:NSASCIIStringEncoding error:nil];

Which should be

NSData *tutorialsHtmlData = [NSData dataWithContentsOfURL:tutorialsUrl options:NSUTF8StringEncoding error:nil];
rckoenes
  • 69,092
  • 8
  • 134
  • 166
  • I have tried with NSUTF8StringEncoding also, but encoding does not change, funny chars are still here :( – b4d Jan 22 '13 at 16:21
  • I've tried copying the whole table to [link](http://b4d.sablun.org/xpath.html) and if I parse this link, UTF-8 encoding is read properly. But if I parse the original site UTF-8 breaks. – b4d Jan 22 '13 at 17:51