1

I'm currently using a regular expression to find some files my app previously created, but I have problems with german "Umlauten" like ö,ä,ü. My expression doesn't match if there are "Umlaute" in the string. I guess it has something to do with the locale, but I can't figure it out what locale to set (already tried nil).

Here's some code:

// Building the regex
var regex = somePrefix + "_("
for string in stringArray{     
     regex += string + "|"  // string can contain öäü
}
regex.remove(at: regex.index(before: regex.endIndex))
regex += ")_w\\d_d\\d"

// Finding files
let fileManager = FileManager()
let files = fileManager.enumerator(atPath: somePath)
while let file = files?.nextObject() {
   let fileName = file as! String            
   if fileName.range(of: regex, options: .regularExpression, range: nil, locale: Locale.current) != nil {
   print(fileName + " found")
   }
}

// Some example that didn't match:

Regex = reis 8_(Ibedir|Drölf )_w\d_d\d
Filename that didnt match = reis 8_Drölf _w0_d0.plist
LarsGvB
  • 267
  • 3
  • 11

1 Answers1

1

Apparently the file names use a different Unicode normalization form than the given strings. The Unicode Regular Expression Guidelines: 3.2 Canonical Equivalents suggest:

Before (or during) processing, translate text (and pattern) into a normalized form. This is the simplest to implement, since there are available code libraries for doing normalization.

This can be achieved by applying .decomposedStringWithCanonicalMapping to both the pattern and the file name.

Martin R
  • 529,903
  • 94
  • 1,240
  • 1,382