17

I have the following code:

string input = "ç";
string normalized = input.Normalize(NormalizationForm.FormD);
char[] chars = normalized.ToCharArray();

I build this code with Visual studio 2010, .net4, on a 64 bits windows 7.

I run it in a unit tests project (platform: Any CPU) in two contexts and check the content of chars:

  • Visual Studio unit tests : chars contains { 231 }.
  • ReSharper : chars contains { 231 }.
  • NCrunch : chars contains { 99, 807 }.

In the msdn documentation, I could not find any information presenting different behaviors.

So, why do I get different behaviors? For me the NCrunch behavior is the expected one, but I would expect the same for others.

Edit: I switched back to .Net 3.5 and still have the same issue.

Fitzchak Yitzchaki
  • 9,095
  • 12
  • 56
  • 96
remio
  • 1,242
  • 2
  • 15
  • 36
  • Hmm, I get { 99, 807 } with Visual Studio... This would imply there is something about the configuration of your project... Maybe. – zmilojko May 10 '12 at 08:11
  • @zmilojko. Thanks for your testing. I get the same results as yours in a blank new project. So I am checking the differences between the two projects (winmerge on csproj), but could not find relevant yet, which was the reason for me posting this question: understand which context could induce a different behavior. – remio May 10 '12 at 10:15
  • 5
    What is `Thread.CurrentThread.CurrentCulture` in each case? – AakashM May 11 '12 at 12:48
  • How do you 'check the content of `chars`'? – Colonel Panic May 11 '12 at 12:59
  • @AakashM, In all cases, `Thread.CurrentThread.CurrentCulture` is `fr-FR`. I also checked `Thread.CurrentThread.CurrentUICulture` which is `en-US` in all cases. – remio May 11 '12 at 13:03
  • 2
    @MattHickford, I gently move my mouse over the `chars` variable in the debugger, then unfold the `+` sign. – remio May 11 '12 at 13:05
  • @AakashM, I used the `ç` character in my example, but I get the same behavior with all of the french accentuated characters I have tested. – remio May 11 '12 at 13:36
  • If I had to guess I'd say something strange is going on with the build configurations, causing an old version of the code to be run in by resharper and visual studio, but one that ncrunch ignores. For example, a library set to build the any configuration, but the GUI set to x86. – Phil Martin May 12 '12 at 07:51
  • @PhilMartin, I am also suspicious about something like that. So, I cleaned it all (hopefully), rebuilt it, also tried it on another computer. Several times. Same result. – remio May 12 '12 at 08:10
  • @PhilMartin, However, I would be really interested in understanding which parameter make `string.Normalize` behave differently. – remio May 12 '12 at 08:17

1 Answers1

7

In String.Normalize(NormalizationForm) documentation it says that

binary representation is in the normalization form specified by the normalizationForm parameter.

which means you'd be using FormD normalization on both cases, so CurrentCulture and such should not really matter.

The only thing that could change, then, what I can think of is the "ç" character. That character is interpreted as per character encoding that is either assumed or configured for Visual Studio source code files. In short, I think NCrunch is assuming different source file encoding than the others.

Based on quick searching on NCrunch forum, there was a mention of some UTF-8 -> UTF-16 conversion, so I would check that.

eis
  • 51,991
  • 13
  • 150
  • 199
  • 1
    Indeed, I was strongly suspecting the encoding of te `ç` character in the source / runtime code. I started playing with the encoding of the source file with no luck. Then, I tried to read the string from an external file, which failed until I forced its encoding to `UTF-8`. Finally, I updated my declaration of `input` to `string input = new string(new[] { (char)231 });`, and... it works! – remio May 14 '12 at 13:13