8

I'm trying to do a simple regex match using NSRegularExpression, but I'm having some problems matching the string when the source contains multibyte characters:

let string = "D 9"

// The following matches (any characters)(SPACE)(numbers)(any characters)
let pattern = "([\\s\\S]*) ([0-9]*)(.*)"

let slen : Int = string.lengthOfBytesUsingEncoding(NSUTF8StringEncoding)

var error: NSError? = nil

var regex = NSRegularExpression(pattern: pattern, options: NSRegularExpressionOptions.DotMatchesLineSeparators, error: &error)

var result = regex?.stringByReplacingMatchesInString(string, options: nil, range: NSRange(location:0,
length:slen), withTemplate: "First \"$1\" Second: \"$2\"")

The code above returns "D" and "9" as expected

If I now change the first line to include a UK 'Pound' currency symbol as follows:

let string = "£ 9"

Then the match doesn't work, even though the ([\\s\\S]*) part of the expression should still match any leading characters.

I understand that the £ symbol will take two bytes but the wildcard leading match should ignore those shouldn't it?

Can anyone explain what is going on here please?

PiotrWolkowski
  • 8,408
  • 6
  • 48
  • 68
NEIL STRONG
  • 113
  • 7
  • I'm not familiar with Swift and its regex engine, but in general I would be terribly surprised to find that `\s\S` isn't equivalent to `.` when Unicode is involved. Why aren't you using `.*` in the first grouping? That said, I'm not entirely convinced that tha's where the problem is, either; I think it's more likely that `[0-9]` fails to match unicode digits than that `\S` fails to match arbitrary non-space unicode characters. – Kyle Strand Apr 20 '15 at 19:23
  • Swift *does* support the `\d` character class, so why are you using `[0-9]`? If you try matching with `(.*) (\d*)(.*)`, do you get a match? – Kyle Strand Apr 20 '15 at 19:27
  • Thanks Kyle. I was using \s\S because of a mis-reading of an article about misuse of the '.' character. I've changed it to "(.*) (\d*)(.*)" but it still fails to match. I'm beginning to suspect it is a bug in the Swift implementation - any other character matches OK - E.g. "D$+@ 9" but when I put a '£' symbol anywhere in the string to be matched, it fails! – NEIL STRONG Apr 22 '15 at 05:54

2 Answers2

14

It can be confusing. The first parameter of stringByReplacingMatchesInString() is mapped from NSString in Objective-C to String in Swift, but the range: parameter is still an NSRange. Therefore you have to specify the range in the units used by NSString (which is the number of UTF-16 code points):

var result = regex?.stringByReplacingMatchesInString(string,
        options: nil,
        range: NSRange(location:0, length:(string as NSString).length),
        withTemplate: "First \"$1\" Second: \"$2\"")

Alternatively you can use count(string.utf16) instead of (string as NSString).length .

Full example:

let string = "£ 9"

let pattern = "([\\s\\S]*) ([0-9]*)(.*)"
var error: NSError? = nil
let regex = NSRegularExpression(pattern: pattern,
        options: NSRegularExpressionOptions.DotMatchesLineSeparators,
        error: &error)!

let result = regex.stringByReplacingMatchesInString(string,
    options: nil,
    range: NSRange(location:0, length:(string as NSString).length),
    withTemplate: "First \"$1\" Second: \"$2\"")
println(result)
// First "£" Second: "9"
Martin R
  • 529,903
  • 94
  • 1,240
  • 1,382
  • Thank you Martin - that does explain why the length of the string containing the currency symbol was being reported as 4 rather than 3. It reports the length correctly now, but the expression still doesn't match I'm afraid. – NEIL STRONG Apr 22 '15 at 05:47
  • 1
    My apologies Martin - must be too early in the morning - your solution DID work!! Many thanks!! :-) – NEIL STRONG Apr 22 '15 at 06:08
  • Thank you soo much! I spent so much time trying to resolve this. – 3li Apr 06 '20 at 15:12
0

I've run into this a couple times and Martin's answer helped me understand the problem. Here's a quick version of the solution that worked for me.

If your regular expression function includes a range parameter built like this:

NSRange(location: 0, length: yourString.count)

You can change it to this:

NSRange(location: 0, length: yourString.utf16.count)
arlomedia
  • 8,534
  • 5
  • 60
  • 108