2

I see a difference in behavior between C# (.NET v4.0) and Java for converting 'İ' to lowercase with "invariant" culture.

In Java, "İ".toLowerCase(Locale.ROOT) returns 'i'.

In C#, "İ".ToLowerInvariant() and "İ".ToLower(CultureInfo.InvariantCulture) both return "İ" but "İ".ToLower(new CultureInfo("en-EN")) returns 'i'.

Looks like Java is doing the conversion correctly but C# is not. Is this a bug in C#?

bittusarkar
  • 6,247
  • 3
  • 30
  • 50
  • 1
    C# is a large field. Which version of .NET / of the CLR do you use in your example? .NET 1.1, 2.0, 3.0, 3.5, 4.0, 4.5, 4.6? – Thomas Weller Jul 22 '16 at 09:43
  • Done. Added to the question. – bittusarkar Jul 22 '16 at 09:58
  • `CultureInfo` value may affect character set used when performing conversion. English tend to use either ASCII or ISO-8859-1 by default, where `InvariantCulture` may using UTF-8 encoding that includes special Turkish `İ` character. – Tetsuya Yamamoto Jul 22 '16 at 10:07
  • Just to clarify, `ToLowerInvariant()` is the equivalent of `ToLower(CultureInfo.InvariantCulture)` so you'd get the same output for those. [Good post](http://haacked.com/archive/2012/07/05/turkish-i-problem-and-why-you-should-care.aspx/) related to the 'Turkish test' – keyboardP Jul 22 '16 at 10:28

1 Answers1

3

Let's have a look. The letter of the question

İ

is in fact

U + 0130: Latin Capital Letter I With Dot Above

(Character Map quotation). It seems reasonable, IMHO, that in case of Invariant Culture (we have no right to use any culture either English or Turkish) ToUpperInvariant() method should return the letter itself (since it's capital already) and for ToLowerInvariant the result should be something like

U + xxxx: Latin Small Letter I With Dot Above

However, we don't have such a letter:

https://en.wikipedia.org/wiki/Dotted_and_dotless_I

And since we don't have the letter required, all we can do is to leave the original one intact.

When we use, say "en-EN" (English) culture we have a right to correspond Letter I With Dot Above to just good old English I and thus return i for ToLower().

Dmitry Bychenko
  • 180,369
  • 20
  • 160
  • 215
  • I understand this. What I did not understand is how Java is able to convert it with `Locale.ROOT`. Do the above case conversion rules not apply to Java? Is this a bug in Java? – bittusarkar Jul 22 '16 at 10:49
  • http://stackoverflow.com/questions/11063102/using-locales-with-javas-tolowercase-and-touppercase – Dmitry Bychenko Jul 22 '16 at 11:56