9

I have string "Ärger,-Ökonom-i-Übermut-ẞ-ß" and when I run IndexOf("--") I get a result of 23. If I use Replace on same string nothing gets replaced.

I don't understand what is happening, so can someone please shed some light on this issue? Application Culture is set on Croatian, it's not German, and framework version is 3.5.

Changing culture to German (de-DE) doesn't change this strange behavior.

Here is the screenshot from the debugger:

enter image description here

Abhranil Das
  • 5,702
  • 6
  • 35
  • 42
Antonio Bakula
  • 20,445
  • 6
  • 75
  • 102
  • 2
    Is it correct that there is no "--" in the String? – Volker Mauel Feb 13 '12 at 12:43
  • I would say so, somehow IndexOf is treating ẞ as a -, exactly that is the problem – Antonio Bakula Feb 13 '12 at 12:44
  • 2
    Sounds like a bug to me. I can reproduce the issue with .Net 3.5, but it returns -1 as expected with .Net 4.0. – ken2k Feb 13 '12 at 12:47
  • What happens if you explicitly set the culture info to de-de? – Dennis Traub Feb 13 '12 at 12:53
  • @DennisTraub Doesn't fix the problem on my machine (.net 3.5). – ken2k Feb 13 '12 at 13:01
  • I updated my question with info that changing thread culture to German doesn't fix the issue – Antonio Bakula Feb 13 '12 at 13:07
  • 11
    I'm afraid that U+1E9E is undefined according to .NET 3.5, because this character didn't exist in Unicode 4.0 (or whatever version of Unicode .NET 3.5 uses). It's a fairly new addition (uppercase version of German ß). So the IndexOf function ignores it. If you have any control over the text, you could change the character to ß or SS, whatever is more appropriate. Of course the better solution is to upgrade .NET to v4.0! – Mr Lister Feb 13 '12 at 13:16
  • 1
    @Mr Lister, OK, so maybe this is not a bug. I guess it depends from one point of view :) Please write answer so I can accept it. – Antonio Bakula Feb 13 '12 at 13:19
  • But LukeH already gave at least half the answer. You can also accept his. – Mr Lister Feb 13 '12 at 13:22
  • 1
    Well, I really think that your comment clarified this issue, important thing here is that U+1E9E undefined in .NET 3.5 – Antonio Bakula Feb 13 '12 at 13:24
  • 1
    @MrLister I think the OP is right, you should write your comment as an answer so the OP can accept it. – ken2k Feb 13 '12 at 13:38
  • I pasted Mr Lister comment and accept it, also marked as Community Wiki – Antonio Bakula Feb 16 '12 at 12:43
  • Tag `german` removed as part of the [2012 cleanup](http://meta.stackexchange.com/questions/128315/the-great-stack-overflow-tag-question-cleanup-of-2012). – Abhranil Das Apr 30 '12 at 11:55

2 Answers2

3

Since Mr Lister doesn't want his well deserved upvotes, I will paste his comment here, and accept answer.

I'm afraid that U+1E9E is undefined according to .NET 3.5, because this character didn't exist in Unicode 4.0 (or whatever version of Unicode .NET 3.5 uses). It's a fairly new addition (uppercase version of German ß). So the IndexOf function ignores it. If you have any control over the text, you could change the character to ß or SS, whatever is more appropriate. Of course the better solution is to upgrade .NET to v4.0!

David Anderson
  • 13,558
  • 5
  • 50
  • 76
Antonio Bakula
  • 20,445
  • 6
  • 75
  • 102
2

IndexOf uses the current culture if you don't tell it otherwise:

This method performs a word (case-sensitive and culture-sensitive) search using the current culture.

Replace uses an ordinal comparison:

This method performs an ordinal (case-sensitive and culture-insensitive) search to find oldValue.

LukeH
  • 263,068
  • 57
  • 365
  • 409
  • 2
    Is there something that changed in this aspect between .NET 3.5 and .NET 4.0? Because the code works as expected in .NET 4.0. – Darin Dimitrov Feb 13 '12 at 13:13
  • @Darin: Not sure - that behaviour has been documented for as long as I can remember. I'm doing some tests now, but I can't replicate the OP's results in .NET4 either. – LukeH Feb 13 '12 at 13:15
  • Yes, but in .NET 3.5 the behavior can be reproduced. – Darin Dimitrov Feb 13 '12 at 13:18
  • 3
    The string functions haven't changed, but the character classification tables were updated, so U+1E9E is defined now. – Mr Lister Feb 13 '12 at 13:18
  • @MrLister, very interesting. Could definitely lead to some very subtle bugs. – Darin Dimitrov Feb 13 '12 at 13:21
  • 1
    @Mr Lister: I think you've hit the nail on the head there. Why not make it into an answer so that we can give you our upvotes? – LukeH Feb 13 '12 at 13:22
  • @DarinDimitrov Sure, but only if you use those new characters in your text. And they are very, very rare! – Mr Lister Feb 13 '12 at 13:24
  • indeed it's a rare situation, and for the record we didn't get this by accident from our users, it was a test for UrlSanatize method, and we test all letters for number of european languages. So this uppercase ß was copied from wikipedia page of German language, as I understand it's not widely used. – Antonio Bakula Feb 13 '12 at 13:27