3

I was thinking about the mathematics of how string.Compare() works in C#.

Is it possible for two unequal strings to ever return 0 on this method call?

I'm referring to strings that are genuinely unequal such as "Herp" and "Derp", not "Herp" and "Hěrp

Unfortunately, apart from the basic null cases, the source code for string.Compare is all internal stuff - outside of .NET.

I believe this is the actual C++ code used for this, but it is difficult to be sure.

The cases I'm considering:

  • strange ordinal behavior (just permutations of strings that end up being equal)
  • Overflowing an integer, causing a positive and negative number for the comparisons, resulting in a 0
  • Anything else crazy someone more versed in the mscorlib implementations than I am

There isn't a specific reason for asking this - just curiosity. And I hadn't seen it asked before for C#!

Codeman
  • 12,157
  • 10
  • 53
  • 91
  • @GrantWinney go down the rabbit hole - the CLR code only takes care of the null cases. If the strings are to be genuinely compared, it goes to C++ code. – Codeman May 12 '15 at 23:26
  • Note, there was a bug with string comparison introduced in .NET 4.0, that would break transitive antisymetric properties, thus breaking sort order. I am not sure whether it is fixed yet. Refer to http://stackoverflow.com/questions/13254153/bug-in-the-string-comparing-of-the-net-framework?lq=1, or to the question that first brought me to SO if you want to read a long bedtime story :D, here: http://stackoverflow.com/questions/17599084/c-sharp-sortedliststring-tvalue-containskey-for-successfully-added-key-return – Alex May 12 '15 at 23:35
  • I see. Basically "does string comparison code in CLR have known bugs" - I'd guess answer would be likely not, but you'd need someone from CLR team to answer. – Alexei Levenkov May 12 '15 at 23:36
  • It depends a bit on what you mean by unequal, but an example: `String.Compare("ss", "ß", false, CultureInfo.InvariantCulture) == 0`. – Guffa May 12 '15 at 23:37
  • If you are calling a variant that does an Ordinal comparison the code is available [here](http://referencesource.microsoft.com/#mscorlib/system/string.cs,8711fff131bc4d0e) (this is the byte by byte one that always works). Note that the main call to [string.Compare](http://referencesource.microsoft.com/#mscorlib/system/string.cs,0be9474bc8e160b6) handles `null` values and has a quick check for the first character being different. – Guvante May 13 '15 at 00:01

1 Answers1

4

I believe that the answer to your question is technically yes, depending on which overload you call, and which option parameters you pass in. According to the MSDN docs it is possible to do the comparison with a Culture that has strange rules for ordinal values of characters, or even skips certain characters:

Notes to Callers

Character sets include ignorable characters. The Compare(String, String) method does not consider such characters when it performs a culture-sensitive comparison. For example, if the following code is run on the .NET Framework 4 or later, a culture-sensitive comparison of "animal" with "ani-mal" (using a soft hyphen, or U+00AD) indicates that the two strings are equivalent.

If you want to ignore Culture and just compare the raw values of 2 strings, you can call the overload String.Compare(s1, s2, StringComparison.OrdinalIgnoreCase). This should result in essentially a byte-by-byte comparison. Docs:

Notes to Callers ... To recognize ignorable characters in your comparison, supply a value of StringComparison.Ordinal or OrdinalIgnoreCase for the comparisonType parameter.

Note that the definition of "greater" or "lesser" strings is not necessarily obvious. For example, is string "abc" greater or lesser than "abcc"? .NET is pretty clear that it is lesser for the purposes of string comparison. But it's good to read the docs carefully before relying on such edge cases:

The comparison terminates when an inequality is discovered or both strings have been compared. However, if the two strings compare equal to the end of one string, and the other string has characters remaining, the string with remaining characters is considered greater. The return value is the result of the last comparison performed.

Jordan Rieger
  • 3,025
  • 3
  • 30
  • 50
  • I'm not referring to Culture, I'm referring to strings that are genuinely unequal such as "Herp" and "Derp", not "Herp" and "Hěrp" – Codeman May 12 '15 at 23:30
  • How are you defining "genuinely unequal"? – Martin Smith May 12 '15 at 23:31
  • @Pheonixblade9: No, these strings should always be different. Since the comparison compares lexicographically. If two characters are different in the two strings, so are the strings. The point is that character can of course be a bit vague as well as equal characters. – Willem Van Onsem May 12 '15 at 23:32
  • @CommuSoft I'm not asking about simple cases, I'm asking about crazy edge cases. I know that short strings will work correctly :) – Codeman May 12 '15 at 23:32
  • @JordanRieger your answer is not incorrect, but it's not what I'm looking for, unfortunately. – Codeman May 12 '15 at 23:34
  • @Pheonixblade9 It sounds like you're referring to "Ordinal" string comparison, e.g. String.Compare(string1, string2, StringCompare.Ordinal), in which case the answer is no, if you specify an ordinal comparison, you will get a byte-by-byte comparison. – Jordan Rieger May 12 '15 at 23:34
  • @JordanRieger can you expand upon that a bit and incorporate it in your answer, please? I'll edit my question to specify that I am asking about ordinal comparisons. Perhaps a bit of background about how ordinal comparisons work to explain? Like I said - I asked this because I thought it was a good question, not because it's something I need now. I wanted it here for others :) – Codeman May 12 '15 at 23:35
  • @Pheonixblade9 OK, I've expanded it. Like other commenters have said, it basically depends on your semantics for "equal" and "unequal". (Of course, bugs in the framework would also affect it, but I'd consider that outside the scope of the question.) – Jordan Rieger May 12 '15 at 23:51
  • It was really about bugs in the framework, I suppose, but I'm not sure that's within the ability of SO to answer :) – Codeman May 12 '15 at 23:54