1

I have a string "Today is Monday and date is 12 januari, 2019". The month in the date format is displayed by locale. I am trying to figure out regex in swift to check if the string contains date and if it does get only date from the string.

I have tried different regex's It seems like my regex is wrong but I can't figure out how to handle the localized month in the date.

 func extractDOB(memberInfo: String) -> [String] {
    var toReturn = [String]()
    let dobRegEx = "[0-9]{2}/s[a-zA-Z]/s[0-9]{4}"
    do {
        let regex = try NSRegularExpression(pattern: dobRegEx)
        let nsString = memberInfo as NSString
        let results = regex.matches(in: memberInfo, range: 
        NSRange(location: 0, length: nsString.length))
        if results.count != 0 {
            for result in results {
                let matchRange = result.range
                toReturn.append(nsString.substring(with: 
         matchRange))
                print(toReturn)
            }
        }
    } catch let error as NSError {
        print("invalid regex: \(error.localizedDescription)")
    }
    return toReturn
  }

String: This is monday and date is 12 januari, 2019 ExpectedOutput: 12 januari, 2019

shilpa
  • 21
  • 5

2 Answers2

1

You can use NSDataDetector:

let text = "Today is Monday and date is 12 januari 2019, which is 12 de enero de 2019 en Español, or 2019年1月12日 in 日本語."

let detector = try! NSDataDetector(types: NSTextCheckingResult.CheckingType.date.rawValue)
detector.enumerateMatches(in: text, range: NSRange(text.startIndex..., in: text)) { match, flags, stop in
    guard
        let match = match,
        let range = Range(match.range, in: text),
        let date = match.date else { return }
    print(text[range], "->", date)
}

Note, that will return five results, though, not just for the three dates, but for “Today” and “Monday”, too:

Today -> 2019-04-16 19:00:00 +0000
Monday -> 2019-04-22 19:00:00 +0000
12 januari 2019 -> 2019-01-12 20:00:00 +0000
12 de enero de 2019 -> 2019-01-12 20:00:00 +0000
2019年1月12日 -> 2019-01-12 20:00:00 +0000

While that is returning extra records, it’s doing a more rigorous date validation, too. For example, it will correctly determine that the following has no date strings in it:

let text = "The date is 12 foobar, 2019."

Note, the above may be affected by what locales have been added to the device in question. E.g. on iOS simulator, it didn’t recognize the Japanese and Dutch dates until those respective languages were installed on the device in question (though they didn’t have to be the current locale). However on macOS, it appeared to recognize all of those locales out of the box. Bottom line, I’d suggest that you should only reliably expect it to recognize dates for locales that the device in question has been configured to support.

Rob
  • 415,655
  • 72
  • 787
  • 1,044
  • it would be great you to elaborate that in which context you'd get `"12 januari, 2019"` value printed out (e.g. changing current locale, etc...) for future reference. – holex Apr 16 '19 at 07:05
  • The data detector is largely locale agnostic. You’d get that “12 januari, 2019” portion of the string regardless of your current locale. The only thing affected by current settings would be that the `date` values are interpreted using my timezone (I’m in California, explaining those `date` values above). – Rob Apr 16 '19 at 14:52
  • there must be something which is different in your context, I am seeing only _"Today"_, _"Monday"_, and _"12 de enero de 2019"_ only, the other two ones are never picked up nor shown (I'm using English locale). interesting. – holex Apr 16 '19 at 14:54
  • 1
    The above is on macOS. It’s didn't recognize the Dutch or Japanese dates on my iOS simulator until I installed those on the simulator. It would appear to only detect languages that have been installed on the device in question. Needless to say, I’ve never installed Dutch on my macOS computer, so macOS must have awareness of more languages included in the base install. – Rob Apr 16 '19 at 15:05
  • This is not working properly for the string "userautomation@example.com | user82197 | 01 maart, 1994" for this string it is returning 01 maart, 2019 instead of 01 maart, 1994 – shilpa May 01 '19 at 20:49
  • @shilpa - Yep. If you look at the matched range, it’s only matching “01 maart” portion. It’s not expecting a comma after the month and before the year. For example, “userautomation@example.com | user82197 | 01 maart 1994” works fine. – Rob May 01 '19 at 21:19
  • @Rob - Got it. Thanks – shilpa May 02 '19 at 14:43
  • @Rob My simulator supports dutch language but with this string: "AHMAD.ZURAIQI@EXAMPLE.NET | Verjaardag: 01 Januari" it is not finding any matches even though there is a match. The same works for this string: QAT2T_740834@EXAMPLE.COM | Verjaardag: 01 Juni Please help me out. – shilpa Aug 02 '19 at 15:43
0

you need to change the pattern a bit for your reg-exp, then this clumsy snippet would suffice:

let input = "Today is Monday and date is 12 januari, 2019"

let dobRegEx = "([0-9]{2}\\s[a-zA-Z,]*?\\s[0-9]{4})"

if let regExp = try? NSRegularExpression(pattern: dobRegEx, options: .caseInsensitive),
    let firstMatch = regExp.firstMatch(in: input, options: .reportCompletion, range: NSRange(location: 0, length: input.count)) {

    let dob = (input as NSString).substring(with: firstMatch.range) // = 12 januari, 2019
    // etc...
}

NOTE: you may also want to consider diving into machine learning (ML) and teach that to recognise the dates embedded into natural language – as the date can be in various formats in various languages (depending on the current locale) and you cannot write effective reg-exp to match all possible scenarios – but that is a bit beyond this answer (or could be too much ado for nothing in your case), however you could start here if you are interested in that.

holex
  • 23,961
  • 7
  • 62
  • 76