It's on the right track, but it's wrong.
"=?UTF-8?Q?B=C3=B6rge_M=C3=B6ller?="<boerge.moeller@foo.org>
There are 2 problems with the above form:
- The encoded-word (the =?UTF-8?Q?...?= bit) is quoted and shouldn't be. Mail software that parse this address won't decode that token if they are standards-compliant.
- The "name" is butted up against the angle brackets and should not be. There MUST be a space in order to be standards compliant.
In other words, this is what it should look like:
=?UTF-8?Q?B=C3=B6rge_M=C3=B6ller?= <boerge.moeller@foo.org>
The RFCs that you need to look at are:
- RFC5322 - this defines the modern Message syntax that is implemented by the server you are trying to interoperate with.
- RFC2047 - this defines the methods and syntax of the encoded-words that are needed to represent non-ASCII characters in headers like
Subject
and address headers (e.g. To/From/Cc/Reply-To/etc). (This is the =?UTF-8?Q?B=C3=B6rge_M=C3=B6ller?=
part)
- RFC822 - this defines the grammar used by RFC2047 and is an older version of RFC5322.
It may also be helpful to read RFC2822 which is newer than RFC822 but older than RFC5322. My guess, however, is that you can skip it because it won't have a lot of value. The only reason RFC822 still has value is because of its much older grammar definitions that are referenced by RFC2047 (such as atom
, dot-atom
, phrase
, angle-addr
, addr-spec
, tspecials
, etc).
RFC6532 is even newer than RFC5322. The purpose of which is to remove the need to encode headers altogether by allowing the use of UTF-8 as an alternative.
Before RFC6532, there was no standard for the character encoding to use for headers other than ASCII (which was what RFC822 used) and so everything was always supposed to conform to ASCII.
A lot of software doesn't follow the standards, however, and so there was a lot of mail in the real world that used ISO-8859-1 and every other character encoding under the sun, all depending on what region the user(s) were in and what character encoding(s) were in wide use in those regions (e.g. Big5 and GB2312 are popular in various parts of China, Shift-JIS being popular in Japan, EUC-KR/KS-C-5601-1987 are popular in Korea, etc).
This caused major interoperability problems, though, not least of which because not every mail client could handle every character encoding under the sun, but also because there was no way for a client to figure out which character encoding was even being used! It's all just binary gobbeldy-gook.
UTF-8, however, has existed for a long time and it can represent all characters in all languages, so it was only logical for it to eventually win out as the standard character encoding to use for international email.