14

I read some values from MS SQL database and I like to make some operations on string. Here is the code I am using to check if some string starts with another string:

String input = "Основното jавно обвинителство денеска поднесе пријава против БМ (59) од Битола заради постоење основи на сомнение дека сторил кривични дела „тешки дела против безбедноста на луѓето и имотот во сообраќајот“ и „неукажување помош на лице повредено во сообраќајна незгода“";
String subString = "Основното јавно обвинителство";
if (input.StartsWith(subString))
{
    Response.Write("OK");
}

However input.StartsWith(subString) does not return true. Does anybody have an idea why?

alexmac
  • 19,087
  • 7
  • 58
  • 69
vikifor
  • 3,426
  • 4
  • 45
  • 75
  • 1
    As a recomendation; [Best Practices for Using Strings in the .NET Framework](http://msdn.microsoft.com/en-us/library/dd465121(v=vs.110).aspx) – Soner Gönül Dec 28 '13 at 16:41

3 Answers3

24

The difference is in the character j in the position 10: its code is 106 in the input, but in your substring it's 1112 (0x458 - see demo).

Your second j comes from Unicode page 4

ј   1112    458 0xD1 0x98   CYRILLIC SMALL LETTER JE

It looks the same, but has a different code.

Re-typing j in the substring fixes this problem.

Sergey Kalinichenko
  • 714,442
  • 84
  • 1,110
  • 1,523
10

The second words in the input and the subString don't match. Put the strings in notepad++ and select each word at a time. The first and last word in the subString match but not the middle one.

This sample demonstrates the problem:

void Main()
{
    var test = "Основното јавно обвинителство";
    var tost = "Основното jавно обвинителство";

    for(var i = 0; i < test.Length; i++){
        Console.WriteLine(string.Format("1: {0}, 2: {1}, Equal: {2}", test[i], tost[i], test[i] == tost[i]));
        if(test[i] != tost[i]){ Console.WriteLine (string.Format("1: {0}, 2: {1}", (int) test[i], (int) tost[i])); }
    }

    Console.WriteLine (test == tost);
}

Relevant output:

1: ј, 2: j, Equal: False
1: 1112, 2: 106
Jeroen Vannevel
  • 43,651
  • 22
  • 107
  • 170
Bogdan Balas
  • 355
  • 1
  • 6
  • Specifically, in the second word "jавно", the "input" string has a "j" as the first character, in "subString", the first character is "%d1%98". – Doug Knudsen Dec 28 '13 at 16:26
6

The strings that you're posted are not equal. Do this:

string s1 = "Основното јавно обвинителство";
string s2 = "Основното jавно обвинителство";
var bt = Encoding.UTF8.GetBytes(s1);
var bt_1 = Encoding.UTF8.GetBytes(s2);

Output will look similar to the following:

56
55

The actual difference is as follows. The "j" in the first string is:

[19]    209 byte
[20]    152 byte

whereas the "j" in the second string is:

[19]    106 byte

First one represents ј with 0xD1 0x98 hexadecimal code and second one represent j with 0x6A hexadecimal code.

Soner Gönül
  • 97,193
  • 102
  • 206
  • 364