3

I want next and previous word of which I searched with REGEX that I can get with pattern as below.

In this case I have searched word "the", So I can get next and previous word of "the". That I can get successfully with following pattern.

'\\b(?=(\\w+\\s+the|the\\s+\\w+)\\b)'

But with this pattern, I am having one issue is when searched word is first in page ("cite" in case of below sample text) or last ("attachments" in case of below sample text) it won't find it.

Sample Text

cite any cases or other legal materials that the arbitrator should read before the hearing attachments

I also getting first and last word but with different pattern. Pattern when searched word is first in page.

For First word

'\\b(?=($+cite|cite\\s+\\w+)\\b)'

For last word

'\\b(?=(\\w+\\s+attachments|attachments+$)\\b)'

I want all these three possibilities with single pattern weather word is first or last or in middle.

Have tested with changing combination, But not successfully.

Can anyone help me please to get all these in one pattern like it should give results for next/previous words?

Niks
  • 647
  • 5
  • 16

2 Answers2

2

You can use this: (\w+)?\s+cite(\s+\w+)?|cite\s+(\w+)? and also (\w+)?\s*\bcite\b\s*(\w+)? (assumed cite token as example word)

Example string:

cite any cases or other legal materials cite that the arbitrator should read before the hearing attachments cite

Matches:

  • any
  • materials
  • that
  • attachments

See DEMO

karthik manchala
  • 13,492
  • 1
  • 31
  • 55
  • will this give me all three posibility in single pattern as you provided? I mean cite is first letter or last letter or in midddle anywhere. – Niks Apr 09 '15 at 13:10
  • There is one issue with this for the case when "cite" is in middle. In above example of your DEMO. It should return me "materials" (as previous word) and "that" (as next word). – Niks Apr 09 '15 at 13:16
  • ok let me check with edited answer, Thanks for your feedback. – Niks Apr 09 '15 at 13:17
  • As per demo is perfact, But somehow when I am trying in my ios app. giving only previous word for the case when word is in middle any where. – Niks Apr 09 '15 at 13:26
  • Try this one.. `(\w+)?\s*cite\s*(\w+)?` – karthik manchala Apr 09 '15 at 13:42
  • Yes..It almost done..Thank you very much but still have one minor issue eg., I am testing with search "the" it also gives result me for "other" as "the" contains in "other. IT should give only specific word if there. Can you please help this so it's perfectly match with my requirement. Thanks appreciate your help really – Niks Apr 09 '15 at 14:02
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/74839/discussion-between-karthik-manchala-and-niks). – karthik manchala Apr 09 '15 at 14:06
1

I think you can capture everything with the following regex that uses optional capture groups, no need using alternations:

(\w+)?\s*\b(cite)\b\s*(\w+)?

Demo

Do not forget to use double escaped slashes in Objective C.

Sample working code:

#import <Foundation/Foundation.h>
#import <Foundation/NSTextCheckingResult.h>

int main (int argc, const char * argv[])
{
   NSAutoreleasePool * pool = [[NSAutoreleasePool alloc] init];

    NSError *error = nil;
    NSString *pattern = @"(\\w+)?\\s*\\bcite\\b\\s*(\\w+)?";
    NSString *string = @"cite any cases or other legal materials cite that the arbitrator should read before the hearing attachments cite";
    NSRange range = NSMakeRange(0, string.length);
    NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:pattern options:0 error:&error];
    NSArray *matches = [regex matchesInString:string options:0 range:range];
    for (NSTextCheckingResult *match in matches) {
       NSRange matchRange = [match range];
       NSString *m = [string substringWithRange:matchRange];
       NSLog(@"Matched string: %@", m);
    }

   [pool drain];
   return 0;
}

Output:

2015-04-09 11:08:22.630 main[26] Matched string: cite any                                                                                                                                                                                              
2015-04-09 11:08:22.633 main[26] Matched string: materials cite that                                                                                                                                                                                   
2015-04-09 11:08:22.633 main[26] Matched string: attachments cite  
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563