-1

I'm trying create a regular expression for string comparison. The regular expression is: .*\bword.*

However, I want to ignore special characters and the comparison should work with and without them.

For example:

O'Reilly should match O'Reilly and oreilly

It is possible do it with a regular expression?

P.S.

This is to be used in iOS with NSPredicate. Currently, the predicate looks like:

NSString *regexString = [NSString stringWithFormat:@".*\b%@.*", word];
NSPredicate *predicate = [NSPredicate predicateWithFormat:@"%K matches[cd] %@", keypath, regexString];

Since NSPredicate doesn't allow me to do any operation like replace the value of the keypath to a value without special characters, I need to do it via regular expression.

mrodriguez
  • 91
  • 12

1 Answers1

1

You might think about preprocessing your string before doing the match. If you have a list of acceptable characters, which looking at your example is just a-z and A-Z you can use the transliteration operator tr/// to remove all the other characters and lc to lower case the string. The flags on tr are c compliment the match, ie match everything that is not listed and d delete everything that matched that does not have a replacement, as the replacement is empty that means everything that matched.

$string =~ tr/a-zA-Z//cd;
$string = lc $string;

If you are using characters outside the ASCII range then you need to be a little cleverer.

$string =~ s/\P{L}+//g;
$string = fc $string;

First off we use a regex to remove any Unicode character that is not in the general category letter. And then we use the fc function to fold case the string, this is the same function that Perl uses to do case insensitive regex matches. Note that you might want to normalise the string first.

JGNI
  • 3,933
  • 11
  • 21
  • The problem is that I can't preprocessing the string before doing the match. Updated the question for better understanding. – mrodriguez Dec 03 '18 at 14:25