2

When a user registers on a site, should we use EncodeForHTML() or EncodeForURL() before storing the value in a DB?

The reason I ask this is that when I send an e-mail to someone that includes a URL that contains an email address as a URL variable, I have to use EncodeForURL(). But if this email address is already encoded using EncodeForHTML(), it will mean I have to Canonicalize() it before using EncodeForURL() on it again.

I would therefore think that EncodeForURL() is probably good, but is it 'safe' and 'correct' when storing the value in a database?

Update: Upon reading the docs it says that EncodeForURL is only for using a value in a URL. Thereofore it seems to make sense that I should store it as EncodedForHTML, but then Canonicalize and re-encode for URL when using it in a URL context. I don't know how much of a performance hit all this encoding is going to take on my server...??

volume one
  • 6,800
  • 13
  • 67
  • 146
  • I can't think of a single reason to not store an email address as simple text. But then again, I can't think of any situation where I would send an e-mail to someone that includes a URL that contains an email address as a URL variable. – Dan Bracuk Apr 07 '15 at 21:00
  • Imagine you're sending a verification email to a user to confirm their registration. In the return URL placed in the email will be the user's email address (URL Encoded) and a verification key value. The reason not to store it as simple text is to prevent XSS attacks – volume one Apr 07 '15 at 21:07
  • 2
    I think I would opt for a hash or something encrypted instead of an email address. I just want to avoid passing personal info around via email and links like that. – Mark A Kruger Apr 07 '15 at 21:27

2 Answers2

9

Copying this from my company's internal documentation. Not sure if the images uploaded correctly since imagr is blocked @ work. If so, I'll re-upload them later. I'll be publishing this and more related content to a Githib repo in the future.


You should store it as simple text, but make sure you scrub your data on the way in using an AntiSamy library. Once the data is safe, make sure to encode the data on the way out using the proper encoder. And FYI, there's a big difference between the output of encodeForHTML() and encodeForHTMLAttribute().

In the below examples, substitute the variables that define email addresses with data from the DB.


PROTIP: Don't use these encoders in CFFORM tags. Those tags take care of the encoding for you. CF 9 and below use HTMLEditFormat(), CF 10 and above most likely use encodeForHTMLAttribute().


Simple Implementation

A basic implementation is to include a single e-mail address in order to populate the "To" field of a new e-mail window.

CFML

<cfset email = "someone@example.com" />
<a href="mailto:#email#">E-mail</a>

HTML Output

<a href="mailto:someone@example.com">E-mail</a>

CFML with Proper Encoding

<cfset email = "someone@example.com" />
<a href="mailto:#encodeForURL(email)#">E-mail</a>

Encoded HTML Output

Notice that the "@" symbol is properly percent encoded as "%40".

<a href="mailto:someone%40example.com">E-mail</a>

Results when clicked

Simple Implementation Results when clicked.

And if you plan on showing the e-mail address on the page as part of the link:

<cfset email = "someone@example.com" />
<a href="mailto:#encodeForURL(email)#">#encodeForHTML(email)#</a>

Attack Vector

An advanced implementation includes e-mail addresses for "To" & "CC". It can also pre-populate the body and subject of the new e-mail.

CFML without encoding

<cfset email = "someone@example.com" />
<cfset email_cc = "someone_else@example.com" />
<cfset subject = "This is the subject" />
<cfset body = "This is the body" />
<a href="mailto:#email#?cc=#email_cc#&subject=#subject#&body=#body#">E-mail</a>

HTML Output

<a href="mailto:someone@example.com?cc=someone_else@example.com&subject=This is the subject&body=This is the body">E-mail</a>

Results when clicked

enter image description here

Notice that the subject and body parameters contain spaces. While this string will technically work, it is still prone to attack vectors.

Imagine the value of body is set by the result of a database query. This record has been "infected" by a malicious user and the default body message has an appended "BCC" address, so some evil user can get copies of e-mails sent via this link.

Infected Data

<cfset body = "This is the body&bcc=someone@evil.com" />

HTML Output

<a href="mailto:someone@example.com?cc=someone_else@example.com&subject=This is the subject&body=This is the body&bcc=someone@evil.com">E-mail</a>

Results when clicked

enter image description here

In order to stop this MAILTO link from being infected, this string needs to be properly encoded.

CFML with HTML Attribute Encoding

Since "href" is an attribute of the <a> tag, you might think to use the HTML Attribute encoder. This would be incorrect.

<cfset email = "someone@example.com" />
<cfset email_cc = "someone_else@example.com" />
<cfset subject = "This is the subject" />
<cfset body = "This is the body&bcc=someone@evil.com" />
<a href="mailto:#encodeForHTMLAttribute(email)#?cc=#encodeForHTMLAttribute(email_cc)#&subject=#encodeForHTMLAttribute(subject)#&body=#encodeForHTMLAttribute(body)#">E-mail</a>

HTML Output

<a href="mailto:someone&#x40;example.com?cc=someone_else&#x40;example.com&subject=This&#x20;is&#x20;the&#x20;subject&body=This&#x20;is&#x20;the&#x20;body&amp;bcc&#x3d;someone&#x40;evil.com">E-mail</a>

Results when clicked

enter image description here

CFML with URL Encoding

The correct encoding of a MAILTO link is done with the URL encoder.

<cfset email = "someone@example.com" />
<cfset email_cc = "someone_else@example.com" />
<cfset subject = "This is the subject" />
<cfset body = "This is the body&bcc=someone@evil.com" />
<a href="mailto:#encodeForURL(email)#?cc=#encodeForURL(email_cc)#&subject=#encodeForURL(subject)#&body=#encodeForURL(body)#">E-mail</a>

HTML Output with Correct Encoding

Notice these things about the URL encoder:

  1. Each space (" ") is converted to a plus sign ("+") instead of its expected percent value ("%20").
  2. Encoding is otherwise done using percent ("%") values.
  3. Since the individual query paramters are encoded, the ampersands ("&") connecting each paramter were not encoded.
  4. When the "body" paramter is encoded, it includes the "&body=" string that was maliciously injected. This entire string is now part of the message body, which prevents the unintended "bcc" of the e-mail.
<a href="mailto:someone%40example.com?cc=someone_else%40example.com&subject=This+is+the+subject&body=This+is+the+body%26bcc%3Dsomeone%40evil.com">E-mail</a>

Results when clicked

enter image description here

What's with the plus signs? It is up to the individual mail client (e.g. Outlook, GMail, etc.) to correctly decode these URL encoded values.
Adrian J. Moreno
  • 14,350
  • 1
  • 37
  • 44
  • Yup. And the individual query string parameters should you use them. – Adrian J. Moreno Apr 07 '15 at 21:32
  • Is there no point to storing the value as encoded text in the DB. I've been storing everything else in html encoded format e.g. a product description – volume one Apr 07 '15 at 21:34
  • 1
    The problem with doing that is that you could end up with double-encoded text on the output. You should scrub your data going in and encode it on the way out. – Adrian J. Moreno Apr 07 '15 at 21:35
  • Sometimes a user will want to reset their password. They need to enter their email address, which I need to check against the DB, and then if there's a match I will send them a new password reset form. So when I'm conducting this match, I'd have to encode both the stored DB email address and their form input email address right? – volume one Apr 07 '15 at 21:40
  • 1
    If you're going to encode the stored email, then yes. But if possible, save it as plain, scrubbed, text. Then you just scrub the form data (email) and compare. – Adrian J. Moreno Apr 07 '15 at 21:44
  • I saw in my code that I'm using `isValid("email", form.email)`. Is this good enough or do I still need to use an AntiSamy library? [CF11 comes with AntiSamy functionality built-in which was a nice surprise] – volume one Apr 13 '15 at 21:56
  • The easiest way to implement AntiSamy is to loop through all of your FORM and URL scoped variables at the top of onRequestStart() in your Application.cfc. [form.foo = antiSamy(form.foo)] No reason to do them one at a time in some specific processing code. You can still use isValid() afterwards. – Adrian J. Moreno Apr 14 '15 at 15:02
  • Why wouldn't you want to encode the email address before storing in a DB? Looking at this question here it seems email addresses can be used for XSS http://stackoverflow.com/questions/17480809/are-xss-attacks-possible-through-email-addresses – volume one Apr 16 '15 at 11:12
  • In that accepted answer: "You need to ensure that arbitrary user input is sanitized before being rendered." You should really ensure that arbitrary user input is sanitized _before it is stored_. Why would you save bad data, encoded or otherwise? An AntiSamy plugin would remove the script tag entirely, leaving an invalid email address. – Adrian J. Moreno Apr 16 '15 at 15:18
  • I meant after performing the AntiSamy... isn't it still a good idea to encode before storing? – volume one Apr 16 '15 at 18:04
  • No, store it as text, encode on output. Why? Because _how_ you encode it depends on _context_. That's why there are so many encoding functions. HTML != HTMLAttribute != JavaScript != CSS != XML (etc.) – Adrian J. Moreno Apr 16 '15 at 21:21
  • Sorry to carry this on... but when I was storing articles written by users, I did EncodeForHTML on text coming from a textarea input. From reading the CF docs about XSS attacks, I thought this was good practice. So what would one do about storing paragraphs of text in a database - should it not be stored in an encoded format? My current application flow is Canonicalize anything taken out of the database and then to encode it into whatever you want e.g. EncoreForURL(). Hence even though its HTML encoded in the db, it gets cleaned up and then re-encoded depending on context. Is this wrong? – volume one Apr 18 '15 at 14:59
1

Store the email addresses in plain text, then encode them when you use them, depending on the context. If it's going to be a part of URL, use EncodeForURL(). If it's going to be displayed in HTML as text, use EncodeForHtml().

Henry
  • 32,689
  • 19
  • 120
  • 221