1

I'd have thought that's what the LengthInTextElements property was for. The MSDN says this property is:

The number of base characters, surrogate pairs, and combining character sequences in this StringInfo object.

So it sure looks like it should count combining sequences as a single character. But either it doesn't work or I am fundamentally misunderstanding something. This crappy test program ...

static void Main(string[] args)
    {
        string foo = "\u0301\u0065";
        Console.WriteLine(string.Format("String:\t{0}", foo));
        Console.WriteLine(string.Format("Length:\t{0}", foo.Length));
        Console.WriteLine(string.Format("TextElements:\t{0}", new StringInfo(foo).LengthInTextElements));
        Console.ReadLine();
    }

generates this output...

String: `e
Length: 2
TextElements: 2

I would dearly love to count the combining sequence "\u0301\u0065" as a single character. Can this be done with StringInfo?


Well, I figured out what I was doing wrong and it's somewhat embarrassing. I was reversing the order of the character and the diacritic. So making the following ever so tiny change corrects the problem:

static void Main(string[] args)
    {
        string foo = "\u0065\u0301";
        Console.WriteLine(string.Format("String:\t{0}", foo));
        Console.WriteLine(string.Format("Length:\t{0}", foo.Length));
        Console.WriteLine(string.Format("TextElements:\t{0}", new StringInfo(foo).LengthInTextElements));
        Console.ReadLine();
    }

So ... it was just a matter of correctly encoding my test data.

mcwyrm
  • 1,571
  • 1
  • 15
  • 27

1 Answers1

0

I don't think this can be done with StringInfo, the method does more than return combining characters. You could easily write an extension method to do what you want. Something like:

/// <summary>
/// determine number of combining characters in string
/// </summary>
/// <param name="input"><see cref="System.String"/>string to check</param>
/// <returns>integer</returns>
public static int NumberOfCombiningCharacters(this string input)
{
    return input.Where(c => c >= 768 && c <= 879).Count();            
}

And then call the extension method:

string foo = "\u0301\u0065";
int a = foo.NumberOfCombiningCharacters();
Kevin
  • 2,566
  • 1
  • 11
  • 12
  • Just so you know I used 0300 thru 036f for the range and converted them to ints. If you need a different range it should be easy to figure out. – Kevin Oct 07 '14 at 22:03
  • I must not have presented the question correctly. I have no interest in knowing how many combining sequences are in a string; I'm interested in correctly counting the number of characters in a string that contains combining sequences, where a something like ́e̽ is counted as a single character. That is, in fact, what StringInfo.LengthInTextElements is supposed to do. – mcwyrm Oct 08 '14 at 12:12
  • Sorry I misunderstood you question - glad you figure it out! – Kevin Oct 08 '14 at 14:51