10

What's the simplest way, given a string:

NSString *str = @"Some really really long string is here and I just want the first 10 words, for example";

to result in an NSString with the first N (e.g., 10) words?

EDIT: I'd also like to make sure it doesn't fail if the str is shorter than N.

philfreo
  • 41,941
  • 26
  • 128
  • 141

4 Answers4

34

If the words are space-separated:

NSInteger nWords = 10;
NSRange wordRange = NSMakeRange(0, nWords);
NSArray *firstWords = [[str componentsSeparatedByString:@" "] subarrayWithRange:wordRange];

if you want to break on all whitespace:

NSCharacterSet *delimiterCharacterSet = [NSCharacterSet whitespaceAndNewlineCharacterSet];
NSArray *firstWords = [[str componentsSeparatedByCharactersInSet:delimiterCharacterSet] subarrayWithRange:wordRange];

Then,

NSString *result = [firstWords componentsJoinedByString:@" "];
automaticoo
  • 868
  • 7
  • 24
Barry Wark
  • 107,306
  • 24
  • 181
  • 206
  • You beat me to it: +1. Don't forget the componentsJoinedByString: since the OP was looking for an NSString result :) – Jarret Hardie Nov 18 '09 at 01:03
  • Does this work if the string only has 3 words? What is wordIndexes? (it appears unused in the first example) – philfreo Nov 18 '09 at 01:40
  • You'd have to change nWords if there are only three words. You could, of course find the componentsSeparatedByString and count them before deciding on nWords, but you didn't mention that as a requirement in your question. – Barry Wark Nov 18 '09 at 01:56
  • It's mentioned as a requirement now :) since this will be done to many strings loaded from a web service. So the simplest way is to use some kind of MIN function to set nWords? – philfreo Nov 18 '09 at 02:00
  • Looks to me like you can determine nWords however you like... it's just a variable used for the purposes of illustration here. If you decide nWords should be some percentage of space-separated words retrieved, rather than the literal number 10, then just multiply the [[str componentsSeparatedByString:@" "] count] by that percentage. – Jarret Hardie Nov 18 '09 at 03:07
  • The issue is that I do want the first 10 words, but I don't want it to crash if the string doesn't have 10 words, for instance. I'll do some conditional logic to set nWords then. – philfreo Nov 18 '09 at 03:31
33

While Barry Wark's code works well for English, it is not the preferred way to detect word breaks. Many languages, such as Chinese and Japanese, do not separate words using spaces. And German, for example, has many compounds that are difficult to separate correctly.

What you want to use is CFStringTokenizer:

CFStringRef string; // Get string from somewhere
CFLocaleRef locale = CFLocaleCopyCurrent();

CFStringTokenizerRef tokenizer = CFStringTokenizerCreate(kCFAllocatorDefault, string, CFRangeMake(0, CFStringGetLength(string)), kCFStringTokenizerUnitWord, locale);

CFStringTokenizerTokenType tokenType = kCFStringTokenizerTokenNone;
unsigned tokensFound = 0, desiredTokens = 10; // or the desired number of tokens

while(kCFStringTokenizerTokenNone != (tokenType = CFStringTokenizerAdvanceToNextToken(tokenizer)) && tokensFound < desiredTokens) {
  CFRange tokenRange = CFStringTokenizerGetCurrentTokenRange(tokenizer);
  CFStringRef tokenValue = CFStringCreateWithSubstring(kCFAllocatorDefault, string, tokenRange);

  // Do something with the token
  CFShow(tokenValue);

  CFRelease(tokenValue);

  ++tokensFound;
}

// Clean up
CFRelease(tokenizer);
CFRelease(locale);
Marcus Adams
  • 53,009
  • 9
  • 91
  • 143
sbooth
  • 16,646
  • 2
  • 55
  • 81
  • @sbooth What if my string starts with an @ ... let's say like this comment: `@sbooth how are you`. How can I use the tokenizer to find something like ["@sbooth", "how", "are", "you"]? – Georg Oct 03 '16 at 14:13
  • @Georg I don't believe that type of tokenization is supported natively by `CFStringTokenizer`. For something like username detection you could examine the returned tokens for the username specifier (@) and append it to the ensuing token. Or if your set of allowed characters for usernames is well-defined you could use a regexp. – sbooth Oct 04 '16 at 10:58
7

Based on Barry's answer, I wrote a function for the sake of this page (still giving him credit on SO)

+ (NSString*)firstWords:(NSString*)theStr howMany:(NSInteger)maxWords {

    NSArray *theWords = [theStr componentsSeparatedByString:@" "];
    if ([theWords count] < maxWords) {
        maxWords = [theWords count];
    }
    NSRange wordRange = NSMakeRange(0, maxWords - 1);
    NSArray *firstWords = [theWords subarrayWithRange:wordRange];       
    return [firstWords componentsJoinedByString:@" "];
}
philfreo
  • 41,941
  • 26
  • 128
  • 141
2

Here's my solution, derived from the answers given here, for my own problem of removing the first word from a string...

NSMutableArray *words = [NSMutableArray arrayWithArray:[lowerString componentsSeparatedByString:@" "]];
[words removeObjectAtIndex:0];
return [words componentsJoinedByString:@" "];
Pedro
  • 878
  • 1
  • 12
  • 29