0

My understanding is that the original SMTP protocol was defined to limit transmission of characters using only 7 bits to save of transmission costs.

This protocol is almost 40 years old, and since then multiple RFCs have extended the standards.

For compatibility reasons, many if not most modern servers that are 8bit clean, perform a conversion of the message into a "7bit compatible" format, such as quoted-printable, or base64.

So technically, all the characters are 7bit ASCII.

However, the crux of my question is, even if data is encoded in a 7bit friendly way, does this mean that the physical transmission of bits between SMTP server occurs in 7bit units, or does it happen in 8bits?

My assumption is that it happens in 8bits, even if the data is encoded in ASCII. Is this correct?

Here are some relevant links I found:

<< Users send billions of 8-bit messages every year. As far as I know, all servers can handle 8-bit messages. A few years ago I was able to find a few hosts running ancient 7-bit versions of sendmail, but I don't see any now.>>

http://cr.yp.to/smtp/8bitmime.html

<< In practice, however, the body is typically encoded using all eight bits. >>

https://www.ibm.com/support/knowledgecenter/en/SSB27U_6.4.0/com.ibm.zvm.v640.kiml0/smtmlfr.htm

<< This does not cause problems in practice, since virtually all modern mail relays are 8-bit clean >>

https://en.wikipedia.org/wiki/Simple_Mail_Transfer_Protocol#8BITMIME

Update

The refinement of my question should be stated as: Do SMTP servers today still clear the high bit, and encode the 7bit ASCII using only the lower seven bits, or do they actually use the full octet, giving signinficance to the MSB?

Cyril Graze
  • 3,881
  • 2
  • 21
  • 27

1 Answers1

0

I think what you are asking is: "Do SMTP clients shift bits when sending messages to an SMTP server such that each character only uses 7 bits and the 8th bit is the start of the next character?"

If so, no. That has never been the case.

Since the very beginning, SMTP clients/servers have always used all 8 bits per character.

In other words, SMTP clients and servers used the ASCII character encoding which does not include accented characters that are found in 8bit character encodings such as ISO-8859-1. Characters with a value above 127 in the ASCII encoding are treated as undefined.

There are likely a number of reasons for this:

  1. ASCII is simple to support
  2. Every locale had their own preferred extended character encoding that was not compatible with other locales - some of which required more than a single byte to represent a character.
  3. I'm not sure if UTF-8 existed yet (but multibyte unicode did, I think - e.g. UCS2 / UTF-16)
  4. It was difficult and unrealistic to expect so much software to implement character set conversion between all of the widely used character sets (unicode and charset conversion libraries were not as widely available at the time)
  5. The "MESSAGE" specification that preceded MIME, SMTP, etc. was written for the US "internet" and likely didn't need anything outside of ASCII (hence why the original message specifications e.g. rfc0822 and earlier did not define encoding mechanisms).
jstedfast
  • 35,744
  • 5
  • 97
  • 110
  • Thanks @jstedfast. I've marked your answer correct as it does answer the main part of my question. However, I think what I was trying to communicate is whether or not SMTP servers still clear the MSB in the octet? Not whether they would transmist 7 bit units as I agree that has never happened. So the refinement of my question should be stated as: Do SMTP servers today clear the high bit, and encode the 7bit ASCII using only seven bits, or they they actually use the full octect, sometimes giving meaning to the MSB? – Cyril Graze Oct 18 '20 at 15:32
  • I don't think they ever cleared the MSB. I think it was just a problem because it might be interpreted differently depending on what the system's locale charset was. Don't quote me on that, though. I just don't think programmers would have gone to the effort of clearing the MSB. – jstedfast Oct 19 '20 at 01:26
  • RFC788 says verbatim: "Each 7-bit character is transmitted right justified in an 8-bit byte (or octet) with the high order bit cleared to zero." – Cyril Graze Oct 19 '20 at 15:43
  • Right, but that is just describing what is meant by "7bit". It is not suggesting that server implementations of the SMTP protocol clear the MSB as they receive data from the client. – jstedfast Oct 19 '20 at 16:35
  • In RFC1428: << RFC 821 defines a 7bit transport. A transport agent which does not clear the high order bit upon receipt of octets with this bit set in SMTP messages is called 8 bit transparent in this document. An implementation of the general SMTP Extensions document [3] and the 8bit extensions protocol [4] which passes MIME messages using all 8 bits of an octet is called 8bit ESMTP. An implementation of extended SMTP which does not accept 8bit characters is called 7bit ESMTP. >> That would indicate that in older 7-bit implementations, the SMTP gateway would indeed clear the MSB defacto? – Cyril Graze Oct 19 '20 at 16:56
  • You appear to be correct, then, yes. That said, ESMTP servers that support the 8BITMIME extension are the only SMTP servers that your code should be sending 8bit message data to. Everything else should be handled by encoding the message text (and headers) to be 7bit clean. – jstedfast Oct 19 '20 at 17:01
  • So how could I go about finding this out with SMTP servers today: If I send a quoted printable 7-bit message, how can I see for sure if the message is sent with the MSB cleared or not? – Cyril Graze Oct 19 '20 at 20:08
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/223318/discussion-between-jstedfast-and-cyril-graze). – jstedfast Oct 19 '20 at 20:11