6

I'm just curoius how does .ToUpper() work? Is there a some sort of mapping that a lower a have UTF code XYZ and the upper has UTF code XYZ1?

Simon Edström
  • 6,461
  • 7
  • 32
  • 52

4 Answers4

3

Yes, it's making use of the Unicode metadata. Every character (Unicode code point) has a case as well as case mapping to upper- and lowercase (and title case). .NET uses this information to convert a string to upper- or lowercase. You can find the very same information in the Unicode Character Database.

Joey
  • 344,408
  • 85
  • 689
  • 683
0

String.ToUpper just uses the CurrentCulture in core.

Form disassembled version of String.ToUpper() from mscorelib.dll, you can see this:

public string ToUpper(CultureInfo culture)
{
    if (culture == null)
    {
        throw new ArgumentNullException("culture");
    }
    return culture.TextInfo.ToUpper(this);
}

So it depends on your current culture. There is always a suitable overload of it where you can specifiy alternative culture.

EDIT

Internally it calls nativeChangeCaseString function at the end with its native implementation. How does it implemented internally, I have no idea, cause it's something that can be answered by person who developed it.

As suggested by @Tim add a link to

TextInfo.ToUpper which provides some more information on subject.

Tigran
  • 61,654
  • 8
  • 86
  • 123
  • 3
    I'm not sure that really answers the question though. Saying "it calls this method internally" without information on what that method does. – George Duckett Jul 12 '12 at 11:54
  • @GeorgeDuckett: I edited my post, but as I wrote to **concrete** implementation can be answered by the person who developed that fucntion. – Tigran Jul 12 '12 at 11:59
  • 1
    @Tigran: You might want to add the link to [`TextInfo.ToUpper`](http://msdn.microsoft.com/en-us/library/fsc2y169) since there are some more informations, for instance that the returned string might differ in length from the input string which proves OP's mapping approach wrong. – Tim Schmelter Jul 12 '12 at 12:10
0

This has been asked before (in a round-about) way on StackOverflow. Granted, it's not about C# or .NET, but answers the Unicode part of this question.

How do you set strings to uppercase / lowercase in Unicode?

Community
  • 1
  • 1
Dai
  • 141,631
  • 28
  • 261
  • 374
0

If you are interested in design aspects of ToUpper() implementation then you can refer to following sections:

  • FlyWeight design pattern from Gang of Four design pattern catalog is used to handle character related functionality
  • As per this design pattern each unit in the collection is designed as an object which has defined behavior, the final object is collection of smaller units
  • In the case of String - the given String is actually handled as array of characters, where each character is an object with defined behavior
  • Going with this design pattern when we call ToUpper(), it iterates over the characters of string and internally delegates the call to each character. While calling ToUpper on character, String class also passes reference of Locale which contains details of character map and encoding

If you are interested in actual implementation then you can refer to the open source implementation of java.lang.String class part of Java language - this is equivalent to C# string utility class.

Following are the links where you can find source code of java.lang.String class - there are 2 overloaded methods: toUpper() and toUpper(Locale). internally toUpper() calls toUpper(Locale) with default locale, so the second method will of interest to you.

http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/lang/String.java

Hope this information helps.

Rutesh Makhijani
  • 17,065
  • 2
  • 26
  • 22