why the TextElementEnumerator not properly parsing the Tamil Unicode character.
using System;
using System.Collections.Generic;
using System.Globalization;
namespace Glyphtest
{
internal class Program
{
private static void Main()
{
const string unicodetxt1 = "ஊரவர் கெளவை";
List<string> output = Syllabify(unicodetxt1);
Console.WriteLine(output.Count);
const string unicodetxt2 = "கௌவை";
output = Syllabify(unicodetxt2);
Console.WriteLine(output.Count);
}
public static List<string> Syllabify(string unicodetext)
{
if (string.IsNullOrEmpty(unicodetext)) return null;
TextElementEnumerator enumerator = StringInfo.GetTextElementEnumerator(unicodetext);
var data = new List<string>();
while (enumerator.MoveNext())
data.Add(enumerator.Current.ToString());
return data;
}
}
}
Following above code sample deals with Unicode character
'கௌ'-> 0x0bc8 (க) +0xbcc(ௌ). (Correct Form)
'கௌ'->0x0bc8 (க) +0xbc6(ெ) + 0xbb3(ள) (In Correct Form)
Is it bug in Text Element Enumerator Class , why its not to Enumerate it properly from the string.
i.e கெளவை => 'கெள'+ 'வை' has to enumerated in Correct form
கெளவை => 'கெ' +'ள' +'வை' not to be enumerated in Incorrect form.
If so how to overcome this issue.