Why Icu4j returns more letters in case-insensitive patterns?

Asked Sep 23 '22 at 04:36

Active Sep 23 '22 at 04:36

Viewed 15 times

The following code returns regex patterns matching English characters.

For matching only lowercase characters

LocaleData.getExemplarSet(ULocale.forLanguageTag("en-US"), 0).toPattern(true),
LocaleData.getExemplarSet(ULocale.forLanguageTag("en-US"), 0).toPattern(false),

For matching both upper case and lowercase characters

LocaleData.getExemplarSet(ULocale.forLanguageTag("en-US), UnicodeSet.CASE).toPattern(true)
LocaleData.getExemplarSet(ULocale.forLanguageTag("en-US), UnicodeSet.CASE).toPattern(false)

And here's the output both all together:

[a-z]
[a-z]
[A-Za-zſK]
[A-Za-z\u017F\u212A]

Why the latter (case insensitive) includes also ſ (long s) and K (Kelvin sign)?

asked Sep 23 '22 at 04:36

dzieciou

4,049
8
41
85

Why Icu4j returns more letters in case-insensitive patterns?

0 Answers0