0

I have Chinese news feed and I want to break the sentence into smaller chunks to pass to the API.

How can I do it in ios? I have set character length of 50 characters for English language.

Currently I am using rangeOfString: function to find dot, comma and break into sentence.

NSString *str  = nil, *rem = nil;

str = [final substringToIndex:MAX_CHAR_Private];
rem = [final substringFromIndex:MAX_CHAR_Private];
NSRange rng = [rem rangeOfString:@"?"];
if (rng.location == NSNotFound) {
    rng = [rem rangeOfString:@"!"];
    if (rng.location == NSNotFound) {
        rng = [rem rangeOfString:@","];
        if (rng.location == NSNotFound) {
            rng = [rem rangeOfString:@"."];
            if (rng.location == NSNotFound) {
                rng = [rem rangeOfString:@" "];
            }
        }
    }
}
if (rng.location+1 + MAX_CHAR_Private > MAXIMUM_LIMIT_Private) {
    rng = [rem rangeOfString:@" "];
}

if (rng.location == NSNotFound) {
    remaining = [[final substringFromIndex:MAX_CHAR_Private] retain];
}
else{
    //NSRange rng = [rem rangeOfString:@" "];
    str = [str stringByAppendingString:[rem substringToIndex:rng.location]];
    remaining = [[final substringFromIndex:MAX_CHAR_Private + rng.location+1] retain];
}

This is not working correctly for chinese and japanese characters.

Larme
  • 24,190
  • 6
  • 51
  • 81
Shefali Soni
  • 1,234
  • 14
  • 26

2 Answers2

1

Check NSLinguisticTagger, It should work with Chinese:

From Apple: "The NSLinguisticTagger class is used to automatically segment natural-language text and tag it with information, such as parts of speech. It can also tag languages, scripts, stem forms of words, etc."

Apple documentation NSLinguisticTagger Class Reference

Also see NSHipster NSLinguisticTagger.

Also see objc.io issue 7

zaph
  • 111,848
  • 21
  • 189
  • 228
  • "If you’re on iOS, then you are currently (as of iOS 7) limited to English only. On OS X (as of 10.9/Mavericks) you have a slightly larger list available; the method +[NSLinguisticTagger availableTagSchemesForLanguage:] lists all schemes available for a given language. The likely reason for limiting the number on iOS is that the resource files take up a lot of space, which is fine on a laptop or desktop machine, but not so good on a phone or tablet." Got this from a tutorial, Do you have any other option? Plz share – Shefali Soni Jul 03 '14 at 02:51
0

NSString provides that out of the box with NSStringEnumerationBySentences enumeration option:

[string enumerateSubstringsInRange:NSMakeRange(0, [string length])
                           options:NSStringEnumerationBySentences
                       usingBlock:^(NSString *substring, NSRange substringRange, NSRange enclosingRange, BOOL *stop)
    {
        NSString *sentence = [substring stringByTrimmingCharactersInSet:whiteSpaceSet];
        // process sentence
    }
];
Vladimir
  • 170,431
  • 36
  • 387
  • 313