1

I want to match the word "février" or any other month by using regular expression.

Regular expression:

^(JANVIER|FEVRIER|MARS|AVRIL|MAI|JUIN|JUILLET|AOUT|SEPTEMBRE|OCTOBRE|NOVEMBRE|DECEMBRE|Jan|Feb|Mar|Apr|May|Jun|JUN|Jul|Aug|Sep|Oct|Nov|Dec|[jJ]anvier|[Ff]évrier|[mM]ars|[aA]vril|[mM]ai|[jJ]uin|[jJ]uillet|[aA]o[éû]t|aout|[sS]eptembre|[oO]ctobre|[nN]ovembre|[dD][eé]cembre)$


Problem

The problem is that I cannot match the words that contain unicode letters: à,é,è etc. I found on the following website: Unicode that the unicode value of é is \u00E9. Can i integrate this value in the regular expression? and how can I use unicode values in regular expressions.


void returnValue(string pattern)
{
    bool x = false;
    const boost::regex e("février");
    x = boost::regex_search(pattern.c_str(),e);
    if(x){ cout <<"found"<<endl; }
}
Community
  • 1
  • 1
Hani Goc
  • 2,371
  • 5
  • 45
  • 89

1 Answers1

3

You can match a unicode with boost::regex. There are two ways to do it.

  1. Rely on wchar_t if your platform's wchar_t can hold Unicode characters and your platform's C/C++ runtime correctly handles wide character constants. (this has few pitfalls, not suggested, read about this in the link I provided)

  2. Use a Unicode aware regular expression type (boost::u32regex). Boost has to be configured to enable this via Building With Unicode and ICU Support

http://www.boost.org/doc/libs/1_42_0/libs/regex/doc/html/boost_regex/unicode.html

4pie0
  • 29,204
  • 9
  • 82
  • 118
  • I found the following example on http://www.boost.org/doc/libs/1_43_0/libs/regex/doc/html/boost_regex/ref/non_std_strings/icu/unicode_iter.htmlDo u know what this regular expressions means? const char* re = "([[:Sc:]][[:Cf:][:Cc:][:Z*:]]*)?" – Hani Goc May 29 '14 at 12:20
  • 2
    thease are ICU regexes, you need , but in order to use this header you will need the ICU library, and you will need to have built the Boost.Regex library with ICU support enabled. – 4pie0 May 29 '14 at 12:26