6

According to Spolsky I can't call myself a developer, so there is a lot of shame behind this question...

Scenario: From a C# application, I would like to take a string value from a SQL db and use it as the name of a directory. I have a secure (SSL) FTP server on which I want to set the current directory using the string value from the DB.
Problem: Everything is working fine until I hit a string value with a "special" character - I seem unable to encode the directory name correctly to satisfy the FTP server.

The code example below

  • uses "special" character é as an example
  • uses WinSCP as an external application for the ftps comms
  • does not show all the code required to setup the Process "_winscp".
  • sends commands to the WinSCP exe by writing to the process standardinput
  • for simplicity, does not get the info from the DB, but instead simply declares a string (but I did do a .Equals to confirm that the value from the DB is the same as the declared string)
  • makes three attempts to set the current directory on the FTP server using different string encodings - all of which fail
  • makes an attempt to set the directory using a string that was created from a hand-crafted byte array - which works

Process _winscp = new Process();
byte[] buffer;

string nameFromString = "Sinéad O'Connor";
_winscp.StandardInput.WriteLine("cd \"" + nameFromString + "\"");

buffer = Encoding.UTF8.GetBytes(nameFromString);
_winscp.StandardInput.WriteLine("cd \"" + Encoding.UTF8.GetString(buffer) + "\"");

buffer = Encoding.ASCII.GetBytes(nameFromString);
_winscp.StandardInput.WriteLine("cd \"" + Encoding.ASCII.GetString(buffer) + "\"");

byte[] nameFromBytes = new byte[] { 83, 105, 110, 130, 97, 100, 32, 79, 39, 67, 111, 110, 110, 111, 114 };
_winscp.StandardInput.WriteLine("cd \"" + Encoding.Default.GetString(nameFromBytes) + "\"");

The UTF8 encoding changes é to 101 (decimal) but the FTP server doesn't like it.

The ASCII encoding changes é to 63 (decimal) but the FTP server doesn't like it.

When I represent é as value 130 (decimal) the FTP server is happy, except I can't find a method that will do this for me (I had to manually contruct the string from explicit bytes).

Anyone know what I should do to my string to encode the é as 130 and make the FTP server happy and finally elevate me to level 1 developer by explaining the only single thing a developer should understand?

Handleman
  • 754
  • 2
  • 10
  • 19
  • 2
    That winscp process is part of the problem, it is a console mode app that operates in code page 437, the old IBM PC encoding. Where 130 is indeed the character code for é. The StandardInput stream normally automatically takes care of the translation but your code is very strange. It cannot work as given in the snippet, the process has to be started first. Lose winscp, use System.Net with its support for FTP. – Hans Passant Feb 25 '11 at 07:03
  • Thanks for the info Hans. I realise the snippet doesn't work as is (I cut out all the process initialisation code). I would love to use some native .net FTP support - but can it support FTP over SSL (ie. ftps)? – Handleman Feb 25 '11 at 07:11
  • Just for completness for future developers - I took Hans' suggestion and looked at the native .net FTP libraries and they can handle ftps - so I very quickly switched the code and now no longer rely on the external WinSCP app and there seem to be no problems with the encoding - it just works. Very happy with not having an external app., much simpler code and the better performance. – Handleman Feb 28 '11 at 03:28

2 Answers2

4

130 isn't ASCII (ASCII is only 7bits -- see the Encoding.ASCII documentation -- so it whacks the "é" into a normal "?" because it has nothing better to do). UTF-8 is actually encoding the character into two bytes (decimal: 195 & 169) but preserves the code-point.

Use a code-page explicitly, such as Latin (CP 1252) -- needs to match whatever other side is. As from below, there is no "130" in the output so... not the encoding you need :-) But the same applies: use an encoding for a specific code-page.

Edit: As Hans Passant explained in a comment, the code-page to use here is MS-DOS (CP 437) which will result in the desired results.

// LINQPad -- Encoding is System.Text.Encoding
var enc = Encoding.GetEncoding(1252);
string.Join(" ", enc.GetBytes("Sinéad O'Connor")).Dump();
// -> 83 105 110 233 97 100 32 79 39 67 111 110 110 111 114

See: http://msdn.microsoft.com/en-us/goglobal/bb688114 for more.

Happy coding.

Btw. good selection in artists -- if it was intentional :p

  • Thanks pst and kudos to Hans. For those interested my code now looks like: string nameFromString = "Sinéad O'Connor"; byte[] buffer = Encoding.GetEncoding(437).GetBytes(nameFromString); _winscp.StandardInput.WriteLine("cd \"" + Encoding.Default.GetString(buffer) + "\""); – Handleman Feb 25 '11 at 08:07
1

I think problem here is that ALL .NET string are in Unicode. There is no "what encoding I'm" in .NET strings. So using Encoding.ASCII.GetString(buffer) you convert your "string" in ASCII back into Unicode.

I think your problem should be solved by changing encoding for Process.StandardInput, so you get correct encoding inside WinSCP.

OR

You should check what Encoding.Default is, because I'm pretty sure it's not UTF8 or ASCII.

Euphoric
  • 12,645
  • 1
  • 30
  • 44
  • Thanks Euphoric. I did find a way to set the Process.StandardInput encoding, and I only tried UTF8, but it didn't seem to help (at home now without the code, will put it up on monday). I'm not too worried about the Default encoding as it's just a way to get my byte array version into a string to test with. – Handleman Feb 25 '11 at 07:14
  • @pst: I didnt mean it that strings dont have any encoding at all. I meant it that you cant choose what encoding string is in. Its always UTF-16. @Handleman: Well, now you can see that this "Default" Encoding was not either UTF-8 or ASCII, but your locale encoding. – Euphoric Feb 25 '11 at 08:19