0

I faced that case independent regex like /(?i)À/ doesn't match the line in lowercase like à.

I checked this in code and made sure that "À".toLowerCase() == "à"

Does case insensitive regex works only for english (or latin) characters?

Here's the sample of code which is supposed to return true, but returns false Pattern.compile("À", Pattern.CASE_INSENSITIVE).matcher("à").matches()

ikryvorotenko
  • 1,393
  • 2
  • 16
  • 27

1 Answers1

4

In , you can specify the flags Pattern.CASE_INSENSITIVE and Pattern.UNICODE_CASE, i.e.:

final Pattern pattern = Pattern.compile("À", Pattern.CASE_INSENSITIVE | 
                                             Pattern.UNICODE_CASE);

With the Pattern.CASE_INSENSITIVE:

When this flag is specified then case-insensitive matching, when enabled by the CASE_INSENSITIVE flag, is done in a manner consistent with the Unicode Standard. By default, case-insensitive matching assumes that only characters in the US-ASCII charset are being matched.

In another way, you can use the embedded flag expression. i.e.:

final Pattern pattern = Pattern.compile("(?iu)À");

The embedded flag expression can include CASE_INSENSITIVE, MULTILINE, DOTALL, UNICODE_CASE, CANON_EQ, UNIX_LINES, LITERAL, UNICODE_CHARACTER_CLASS and COMMENTS.

Next, the list of flags that you can use with its corresponding embedded flag (if exists)::

+-------------------------+------+
| UNIX_LINES              | (?d) |
| CASE_INSENSITIVE        | (?i) |
| COMMENTS                | (?x) |
| MULTILINE               | (?m) |
| LITERAL                 |      |
| DOTALL                  | (?s) |
| UNICODE_CASE            | (?u) |
| CANON_EQ                |      |
| UNICODE_CHARACTER_CLASS | (?U) |
+-------------------------+------+
Paul Vargas
  • 41,222
  • 15
  • 102
  • 148