3

I am receiving SMS messages in the Devanagri (Hindi) script from my mobile phone into my desktop program, but it is displaying the data in an encoding (Eg. - 091A09470924002009240924) which I found out is unicode. Is there an existing library that will allow me to convert this to hindi text? If not, how do I go about writing a method for this? I'm using C#.

Aakar
  • 87
  • 1
  • 6
  • Just to add a note to your "is unicode" you are seeing unicode code points represented by two bytes. I.e. your characters are 091A 0947 0924 0020 0924 0924. You can see the appropriate Devangari code page at http://www.unicode.org/charts/PDF/U0900.pdf. – borrible Jul 14 '11 at 08:21

3 Answers3

1

Use System.Text.Encoding class. It has method GetChars(byte[]). And probably you'll need an appropriate font since some Hindi symbols can be written in several ways.

Mike Mozhaev
  • 2,367
  • 14
  • 13
1

Here's code snippet I used for converting Georgian unicode to its Latin equivalent text.

string[] charset = new string[33] { "a", "b", "g", "d", "e", "v", "z", "T", "i", "k", "l", "m", "n", "o", "p", "J", "r", "s","t", "u", "f", "q", "R", "y", "S", "C", "c", "Z", "w", "W", "x", "j", "h" };
string unicodeString = "აბ, - გდ";
string latin_string = "";
byte[] unicodeBytes = Encoding.Unicode.GetBytes(unicodeString);
for (int p = 0; p < unicodeBytes.Length / 2; p++)
{
if (unicodeBytes[p * 2] > 207 && unicodeBytes[p * 2] < 241)
latin_string += charset[unicodeBytes[p * 2] - 208];
else
latin_string += Convert.ToChar(unicodeBytes[p * 2]).ToString();
}

explaining only the necessary part:

Encoding.Unicode.GetBytes(unicodeString); returns array of bytes, length of this array is 2 * unicodeString.Length. so that every letter from unicodestring has a pair of bytes. for a better explanation heres image attachedenter image description here

unicodeBytes even indexes have values representing the letter you want to decode. first letter of the Georgian alphabet was starting at 208 ending at 240 (33 in total). so if unicodeBytes value was in the range of [208;240] i had to use the charset string array to get the Latin equivalent, otherwise unicodeBytes value was just char code.

I don't know if there is a library for it but this method will give you basic idea how to write your own convertor.

Community
  • 1
  • 1
Nika G.
  • 2,374
  • 1
  • 17
  • 18
0

Thanks for the responses, they helped me find the exact solution - http://social.msdn.microsoft.com/Forums/en/netfxbcl/thread/12a3558d-fe48-44fd-840e-03facfd9c944

Aakar
  • 87
  • 1
  • 6