0

We are encountering some encoding issues, specially when using 8bit as content transfer encoding. First of all, can anyone please tell me how 8bit encoded value of a-umlaut looks like?

What is best in practice to handle encoding?

I tried to use the WriteTo() method of a MIME entity, to write the content into a stream, which works in any cases other than with 8bit encoding.

UPDATE: Currently using the code as posted in one of the examples of MimeKit:

using (MemoryStream memStm = new MemoryStream())
{
    mime.WriteTo(memStm);
    message.MimeMessage = Encoding.UTF8.GetString(memStm.ToArray());
}

But it seems like some kind of double encoding when my MIME contains special characters like: äÄ will result in something like: ¿½

How can I escape those double encoding situations?

PeterK
  • 3,667
  • 2
  • 17
  • 24
grmihel
  • 784
  • 3
  • 15
  • 40
  • 2
    Why are you converting binary data into a string? The content that MimeMessage.WriteTo() outputs should not ever be converted into a string for any conceivable use case. A MimeMessage can have text in multiple charsets throughout the message data, so converting using a single charset encoding can't reliably work. – jstedfast May 13 '15 at 11:08
  • I need he Mime as a string. message.MimeMessage is just a string property. Previously I used mime.ToString() (mime is typeof MimeMessage of the MailKit API, sorry for confusion), but the tostring has issues with danish characters like æøå, which is why I'm using the WriteTo(). So basicly, I have a MailKit.MimeMessage type, of which I want a string, that I can sent to my client and represent as a .mht file in my IE. – grmihel May 13 '15 at 11:19
  • You can't use a string for that, you need to use a byte[] instead. MIME is a compound document where each section can have its own charset. There's no way to convert a compound document with multiple charsets into a string using a single charset converter. – jstedfast May 13 '15 at 13:42
  • So if I want to transfer my mime message to a client by using a HTTP request with a XML document as data type, then the mime element part HAS to be byte array instead of my current string type? – grmihel May 19 '15 at 07:21
  • 2
    Correct, you'll have to use a byte array for that. – jstedfast May 19 '15 at 14:36
  • I have the exact same problem, I need to save to a database and that's what caused me to serialize it as a string, I should have used byte[] option. – Krishnan Venkiteswaran Mar 30 '17 at 10:09

1 Answers1

2

The 8-bit MIME transfer encoding is basically "no encoding", so any MIME data encoded with 8-bit encoding is the same as the binary representation of the data in the given charset. For instance, 'ä' represented in UTF-8 as the following sequence of bytes: 0xC3, 0xA4. When using 8-bit, your MIME data will be the very same sequence of bytes. Other transfer encodings like quoted-printable or base64 will encode those bytes differently, e.g. as w6Q= or =C3=A4.

The takeaway is that the MIME character set specifies how characters are represented in binary form and the MIME content transfer encoding specifies how those bytes get encoded in the MIME document itself.

As for best practices, modern email servers and clients will happily deal with 8-bit encoded emails. Still, the custom is to use either quoted-printable or base64.

As for the double-encoding issue, the sequence äÄ double UTF-8 encoded looks different from ¿½, so I thinks something else is going wrong there. I am not familiar with MimeKit and your code sample does not contain enough information, but if you update your question with more complete repro code, I will be happy to update my answer.

PeterK
  • 3,667
  • 2
  • 17
  • 24
  • So if I have a mime file looks like Content-Type: text/plain;charset="iso-8859-1"; Content-Transfer-Encoding: 8bit; ä would look like 0xC30xA4 if I look on the raw string?? – grmihel May 13 '15 at 11:28
  • @grmihel: Well, in that particular case it would be a single byte, 0xE4. This is because `iso-8859-1` can represent the `ä` character and it represent it as a single 0xE4 byte (see http://www.htmlhelp.com/reference/charset/iso224-255.html, 0xE4 == 228 in decimal). However, if your header would read `Content-Type: text/plain;charset="utf-8"; Content-Transfer-Encoding: 8bit`, you would get the two bytes I talked about in my response. – PeterK May 13 '15 at 12:17