2

I've been struggling all day with trying to find classes that converts/decodes ASCII characters to readable text.

I've found this method here at Stack Overflow, and it fixes many of the characters to readable text. But I'm still struggling with for example:

#&44;
#&46;
#&58;
#&39;

...and so forth.

I'm receiving my data from a XML-file with TBXML and the encoding on the XML is:

iso-8859-1

Does anybody has a method that converts/decodes all the ASCII-characters to readable text?

- (NSString *)stringByDecodingXMLEntities {
    NSUInteger myLength = [self length];
    NSUInteger ampIndex = [self rangeOfString:@"&" options:NSLiteralSearch].location;

    // Short-circuit if there are no ampersands.
    if (ampIndex == NSNotFound) {
        return self;
    }
    // Make result string with some extra capacity.
    NSMutableString *result = [NSMutableString stringWithCapacity:(myLength * 1.25)];

    // First iteration doesn't need to scan to & since we did that already, but for code simplicity's sake we'll do it again with the scanner.
    NSScanner *scanner = [NSScanner scannerWithString:self];

    [scanner setCharactersToBeSkipped:nil];

    NSCharacterSet *boundaryCharacterSet = [NSCharacterSet characterSetWithCharactersInString:@" \t\n\r;"];

    do {
        // Scan up to the next entity or the end of the string.
        NSString *nonEntityString;
        if ([scanner scanUpToString:@"&" intoString:&nonEntityString]) {
            [result appendString:nonEntityString];
        }
        if ([scanner isAtEnd]) {
            goto finish;
        }
        // Scan either a HTML or numeric character entity reference.
        if ([scanner scanString:@"&" intoString:NULL])
            [result appendString:@"&"];
        else if ([scanner scanString:@"'" intoString:NULL])
            [result appendString:@"'"];
        else if ([scanner scanString:@""" intoString:NULL])
            [result appendString:@"\""];
        else if ([scanner scanString:@"<" intoString:NULL])
            [result appendString:@"<"];
        else if ([scanner scanString:@"&gt;" intoString:NULL])
            [result appendString:@">"];
        else if ([scanner scanString:@"&#" intoString:NULL]) {
            BOOL gotNumber;
            unsigned charCode;
            NSString *xForHex = @"";

            // Is it hex or decimal?
            if ([scanner scanString:@"x" intoString:&xForHex]) {
                gotNumber = [scanner scanHexInt:&charCode];
            }
            else {
                gotNumber = [scanner scanInt:(int*)&charCode];
            }

            if (gotNumber) {
                [result appendFormat:@"%C", charCode];

                [scanner scanString:@";" intoString:NULL];
            }
            else {
                NSString *unknownEntity = @"";

                [scanner scanUpToCharactersFromSet:boundaryCharacterSet intoString:&unknownEntity];


                [result appendFormat:@"&#%@%@", xForHex, unknownEntity];

                //[scanner scanUpToString:@";" intoString:&unknownEntity];
                //[result appendFormat:@"&#%@%@;", xForHex, unknownEntity];
                NSLog(@"Expected numeric character entity but got &#%@%@;", xForHex, unknownEntity);

            }

        }
        else {
            NSString *amp;

            [scanner scanString:@"&" intoString:&amp];      //an isolated & symbol
            [result appendString:amp];

             NSString *unknownEntity = @"";
             [scanner scanUpToString:@";" intoString:&unknownEntity];
             NSString *semicolon = @"";
             [scanner scanString:@";" intoString:&semicolon];
             [result appendFormat:@"%@%@", unknownEntity, semicolon];
             NSLog(@"Unsupported XML character entity %@%@", unknownEntity, semicolon);

        }

    }
    while (![scanner isAtEnd]);

finish:
    return result;
}
Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
Fernando Redondo
  • 1,557
  • 3
  • 20
  • 38
  • 1
    A note on terminology - These aren't "ASCII characters", they are "Numeric character entity references". – Stephen P Sep 14 '10 at 16:52
  • Aha, thank you. Do you know how I can accomplish to do what I want to do? I tried to read my XML-document with NSXMLParser cause' of the answer I got from Anders. But that resulted it the same way as with TBXML. – Fernando Redondo Sep 14 '10 at 17:10
  • Now I have also tried out MWFeedParser's method stringByEncodingXMLEntities which works on some characters. But it's still much to go with these - etc etc. – Fernando Redondo Sep 14 '10 at 17:48

1 Answers1

2

Normally you would let the NSXMLparser handle that job for you. You shouldn't need to do the conversion by hand.

If you do a google on NSXMLParser you will get lots of examples.

AndersK
  • 35,813
  • 6
  • 60
  • 86