0

I've got an azure VM with a number of files on it. Some of these files are pretty messed up, for example, containing a UTF8 BOM and non-UTF8 characters, in particular, smart quotes like so:

<option ef="“Late”" />

In order to fix this, I have a small C# utility that opens a StreamReader on each:

StreamReader sr = new StreamReader(filename, Encoding.ASCII, true);

calls .ReadToEnd(), and then checks CurrentEncoding. If I run this process in a powershell window, it returns System.Text.ASCIIEncoding as expected because of the smart quotes and because that's what it does when run anywhere else. If I run it inside a chocolatey package or through octopus deploy, CurrentEncoding equals System.Text.UTF8Encoding.

I'm calling .ReadToEnd() because MSDN says that performing a read will set the encoding correctly. What is different about chocolatey and octopus that is making StreamReader guess the wrong encoding?

sirdank
  • 3,351
  • 3
  • 25
  • 58
  • what do yo mean by saying "accurate" ? Encoding is variable by its means. – Tigran May 03 '17 at 18:06
  • What makes you think that's not accurate? If chocolatey changes the encoding to UTF8 then your process will be launch as UTF8. – Gusman May 03 '17 at 18:07
  • @Tigran I know these files are ASCII because everywhere this is run except through my new chocolatey package, it detects them as ASCII and converts them to UTF8. – sirdank May 03 '17 at 18:13
  • @Gusman Please see my comment above and also my post update. – sirdank May 03 '17 at 18:16
  • As long as a file has no BOM there is no implicit encoding, the raw data can be interpreted differently. –  May 03 '17 at 18:20
  • Just a note, it says "The value can be different after the first call to any Read method", that doesn't means it will always set the correct encoding, if the file has a BOM or any other tag to identify the encoding, then the stream will change it. – Gusman May 03 '17 at 18:21
  • @LotPings It still seems like the same code, run against the same file, should produce the same result even if it could plausibly be interpreted differently under different circumstances. – sirdank May 03 '17 at 18:23
  • @Gusman You are correct but it would appear, since I'm specifying ASCII at the time of instantiation, that running under chocolatey is actually producing an incorrect change to UTF8. – sirdank May 03 '17 at 18:25
  • Maybe if you post all the info at once instead of small pieces on edits we could see the full picture... – Gusman May 03 '17 at 18:26
  • @Gusman Sorry. Let me know if I can post anything else. – sirdank May 03 '17 at 18:28
  • @sirdank We are going to need a lot more context here - plus this looks like a bug - best to log it at https://github.com/chocolatey/choco/issues so that we can triage it better and prioritize it. HTH – ferventcoder May 05 '17 at 03:50
  • @ferventcoder QA said it happens through chocolatey as well as octopus deploy so I'm no longer sure it's related to chocolatey. I can still submit this as a bug if you think I ought to. – sirdank May 05 '17 at 18:45
  • It couldn't hurt - best that we are aware it is occurring. – ferventcoder May 05 '17 at 18:48

0 Answers0