11

I am about to send an html email in code that may contain unsafe user input. I have noticed that if I html escape the subject, GMail will then display the escaped content (so if my subject is "This & That", which I sanitize as "This & That", Gmail shows the latter). The same goes for Thunderbird. Is it safe to assume that all email clients do not need the subject html escaped?

Joel Etherton
  • 37,325
  • 10
  • 89
  • 104
Samer Buna
  • 8,821
  • 9
  • 38
  • 55

3 Answers3

12

No need to encode HTML entities in a subject line. The reason for encoding in the HTML body is if you're using XHTML which, because it derives from XML, treats & as a reserved character.

However, the subject line of an email is not in HTML, XML, or XHTML. It's just pure text. Because of this, you don't need to encode an ampersand as &. If you do encode it, because it's not being parsed as HTML, it will be displayed as encoded.

If you want to include non-ASCII characters (e.g. £), then you need to encode the whole 'envelope' (including the email body) as UTF-8.

So, in code, the following will be shown as:

        | Subject Line |    Body
====================================
&   |    &     |     &       
&       |      &       |     & 
UTF-8 £ |      £       |     £
ASCII £ |     n/a      |    n/a
£ |   £    |     £

NB: Microsoft Office has a weird implementation of UTF-8, so not all UTF-8 characters will work.

Dan Blows
  • 20,846
  • 10
  • 65
  • 96
  • While you don't need to HTML encode the subject, be sure to sanitize it for the context it's in: an HTTP header. Remove line break characters like `\n` and `\r`! – Michael Butler Jun 30 '17 at 14:33
3

Subject's don't need to be html escaped however, they can be escaped. The original SMTP spec defined that messages should be ascii (RFC822 Section 3). This was confirmed in RFC2822 however, RFC2047 defines header extensions that allow you to encode header fields within mime messages to allow non-ascii text.

This encoded-word format allows you to use encodings like utf-8 within email headers. For example:

=?iso-8859-1?q?this=20is=20some=20text?=

This is the ISO-8859-1 encoded version of "this is some text". This method can be used to enode things like the pound symbol (£) or accented / non-ascii characters.

Community
  • 1
  • 1
Steve Smith
  • 5,146
  • 1
  • 30
  • 31
2

The Subject field is outside of the "HTML content". It does not have anything to do with HTML.


However, I do not have a reference for this...