0

I'm looking for a way to convert Unicode UTF-32 (int) to lower case. In Java, something like this, would do the trick:

Character.toChars(Character.toLowerCase(Character.codePointAt(text, i)))

I have UTF-32 from Char.ConvertToUtf32, but there doesn't seem to be a way to lower case that value.

UPDATE: I'm dealing with a stream/array of chars, I've found the code points by looking for the hi surrogate, somewhat similar to the Java snipit above. Converting back and forth to String is going to be to inefficient.

Kyle Hale
  • 7,912
  • 1
  • 37
  • 58
Scott
  • 850
  • 7
  • 13
  • Is it possible to get at the bytes that makes up the Utf32 data? – Tim Lloyd Dec 30 '10 at 01:10
  • Yes, I have the array of chars. – Scott Dec 30 '10 at 01:18
  • The problem is that I really don't want to convert back and forth to strings to get this. Of course I could look for the surrogate convert only if present. But still, there ought to be a way to do a case conversion directly with UTF-32. – Scott Dec 30 '10 at 01:37
  • Even given your preferred solution you will be converting everything from chars to ints and back to chars again. What's the big deal with converting you char array into a string in one go? – Tim Lloyd Dec 30 '10 at 01:49

1 Answers1

0

The only built-in way to do this is convert the UTF-32 to a String. Something like the following should work:

static Int32 ToLower(Int32 c)
{
    // Convert UTF-32 character to a UTF-16 String.
    var strC = Char.ConvertFromUtf32(c);

    // Casing rules depends on the culture.
    // Consider using ToLowerInvariant().
    var lower = strC.ToLower();

    // Convert the UTF-16 String back to UTF-32 character and return it.
    return Char.ConvertToUtf32(lower, 0);
}

You indicate that this is inefficient for your needs. Have you benchmarked it?

If you still insist on doing casing on UTF-32, then you will need to roll your own. Luckily, the Unicode Consortium has done most of the hard work. Take a look at the Unicode case folding file. Parse this file storing the data in an appropriate structure. Then the casing can be done directly against that with your data in whatever format you prefer.

Dono
  • 1,254
  • 13
  • 30