2

I am relatively new to regex expressions and needed some advice.

The goal is to the get data in the following format into an array:

  • value=777
  • value=888

From this data: "value=!@#777!@#value=@#$888*"

Here is my code (Objective C):

NSString *aTestString = @"value=!@#777!@#value=@#$**888***";
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:@"value=(?=[^\d])(\d)" options:0 error:&anError];

So my questions are:

1) Can the regex engine capture data that is split like that? Retrieving the "value=" removing the garbage data in the middle, and then grouping it with its number "777" etc?

2) If this can be done, then is my regex expression valid? value=(?=[^\d])(\d)

Just a coder
  • 15,480
  • 16
  • 85
  • 138

2 Answers2

3

The lookahead (?=) is wrong here, you haven't correctly escaped the \d (it becomes \\d) and last but not least you left out the quantifiers * (0 or more times) and + (1 or more times):

NSString *aTestString = @"value=!@#777!@#value=@#$**888***";
NSRegularExpression *regex = [NSRegularExpression
    regularExpressionWithPattern:@"value=[^\\d]*(\\d+)"
    options:0
    error:NULL
];

[regex 
    enumerateMatchesInString:aTestString
    options:0
    range:NSMakeRange(0, [aTestString length])
    usingBlock:^(NSTextCheckingResult *result, NSMatchingFlags flags, BOOL *stop) {
        NSLog(@"Value: %@", [aTestString substringWithRange:[result rangeAtIndex:1]]);
    }
];

Edit: Here's a more refined pattern. It catches a word before =, then discards non-digits and catches digits afterwards.

NSString *aTestString = @"foo=!@#777!@#bar=@#$**888***";
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:@"(\\w+)=[^\\d]*(\\d+)" options:0 error:NULL];

[regex 
    enumerateMatchesInString:aTestString
    options:0
    range:NSMakeRange(0, [aTestString length])
    usingBlock:^(NSTextCheckingResult *result, NSMatchingFlags flags, BOOL *stop) {
        NSLog(
            @"Found: %@=%@",
            [aTestString substringWithRange:[result rangeAtIndex:1]],
            [aTestString substringWithRange:[result rangeAtIndex:2]]
        );
    }
];

// Output:
// Found: foo=777
// Found: bar=888
DarkDust
  • 90,870
  • 19
  • 190
  • 224
  • Hey man you are correct. I just figured out your edit based on your first help and was about to comment :) One thing tho, I will the leave the pattern as (value=)[^\\d]*(\\d+) because the "value=" is always guaranteed. Thanks much for your answer. Flagging it as correct. – Just a coder Jan 21 '12 at 23:34
0

Regular expresssions are expressions that match a given pattern. A regular expression could match, say, a string like "value=!@#777" using an expression like "value=[#@!%^&][0-9]", which says to match the literal "value=", and then any string made up of the characters #, @, !, %, ^, and &, and finally any string made up of digits. But you can't use a single regular expression by itself to get just the parts of the string that you want, i.e. "value=777".

So, one solution would be to create an expression that recognizes strings such as "value=!@#777", and then do some further processing on that string to remove the offending characters.

I think you'll be better off using NSScanner to scan the data and extract the parts you want. For example, you can use -scanString:intoString: to get the "value=" part, followed by -scanCharactersFromSet:intoString: to remove the part you don't want, and then call that method again to get the collection of digits.

Caleb
  • 124,013
  • 19
  • 183
  • 272
  • Just a quick note, my experience is RegEx is much faster than NSScanner for anything complicated. – Abhi Beckert Jan 21 '12 at 23:06
  • 1
    Of course you can get parts of a matched string, that is what groups are used for. But in this case, you wouldn't get the final `value=777` as output but one match would give you a group for `value` (or `value=`) and a second one for `777` which you just need to combine. – DarkDust Jan 21 '12 at 23:30
  • @DarkDust I meant that a single regular expression can't get all the parts in a single step. You have to either match the parts that you want and recombine them afterward, or match longer sections and remove parts that you don't want, possibly using another expression. – Caleb Jan 22 '12 at 05:03
  • 1
    @AbhiBeckert I'm sure you're right, but speed isn't always the most important consideration. NSScanner is very simple to understand, use, and debug. – Caleb Jan 22 '12 at 05:05