1

I am trying to highlight some substring in a Thai text:

high = high.Insert(myString.Index + myString.Length + "<b>" + currentLength, "</b>");

The issue is, that the myString string contains a special Thai character ("เงินฝาก"). The given string should have a length of 7, but the length is resolved as 6. Which highlights the text only partially, not including the last character.

I've tried encoding the string (both the high and myString string). But it didn't work. Do you have any tips on how to handle this? I've also tried the Replace method, but to no avail.

Thanks in advance!

corvuscorax
  • 5,850
  • 3
  • 30
  • 31
  • I tested `String.Length` with `เงินฝาก` and it returned `7`, not `6`. – UltimaWeapon Mar 02 '15 at 05:35
  • Hi, thanks looking into it. It's possible, that when I copied the string into the editor, it chopped up the characters. Here is a screenshot of my code in the debug mode: [link](http://i.imgur.com/NYpmQKh.png) notice the watched variables – user3307231 Mar 02 '15 at 07:42
  • How do you calculate the length? I cannot read Thai, but selecting one glyph at a time in my browser I count six glyphs in your string. Is there a combining character in there? – tripleee Mar 02 '15 at 08:06
  • I cannot read Thai too, so I am on the same page on this one. It looks like the first character is a combining one. I can see my cursor getting stuck in the middle of the first character, when opening it in a different editor: [link](http://i.imgur.com/Z86fjBi.png). I also tried to set the thread to InvariantCulture, but it didn't help. The thread culture is set to Thai. – user3307231 Mar 02 '15 at 22:42
  • ok, so it seems like the two strings are encoded differently. Here is a screenshot on a character by character comparison and apparently, the third character is nit the same: [character compare](http://i.imgur.com/EjhEuoe.png) – user3307231 Mar 03 '15 at 04:36
  • Do you use regular expression to search for `เงินฝาก`? If yes, it look like your problems is a regular expression pattern since it does not captured the Thai special alphabet. – UltimaWeapon Mar 05 '15 at 05:04
  • I'm Thai, and I can tell that your `searchedText` is the misspelled one. Could you give some snippet we can run to reproduce the issue? I tried to reproduce using `Regex` but to no avail. – tia Mar 10 '15 at 14:35

1 Answers1

0

The easy concept is to ignore the counting of the superscript and subscript Thai character like as the example code below:

    public int ThaiLength(string text)
    {
        int c = 0;
        int l = text.Length;

        for (int i = 0; i < l; ++i)
        {
            if (char.GetUnicodeCategory(text[i]) != System.Globalization.UnicodeCategory.NonSpacingMark)
                ++c;
        }

        return c;
    }
ixhundred
  • 11
  • 2
  • -1, sorry: (1) the code looks like a blatant hack; (2) a correct approach should use `char.GetUnicodeCategory()`, not a hardcoded stuff; (3) even for manual counting, it is sufficient to call `text.ToCharArray().Length` just once, no need for loop; (4) yet more, it is terribly ineffective to call `ToCharArray()` in the loop. – Be Brave Be Like Ukraine Oct 28 '16 at 16:46
  • very good !! char.GetUnicodeCategory() can be used to check the counting length instead of hard code text string. – ixhundred Jul 18 '17 at 02:48