29

I know that the escape function has been deprecated and that you should use encodeURI or encodeURIComponent instead. However, the encodeUri and encodeUriComponent doesn't do the same thing as escape.

I want to create a mailto link in javascript with Swedish åäö. Here are a comparison between escape, encodeURIComponent and encodeURI:

var subject="åäö";
var body="bodyåäö";

console.log("mailto:?subject="+escape(subject)+"&body=" + escape(body));
console.log("mailto:?subject="+encodeURIComponent(subject)+"&body=" + encodeURIComponent(body));
console.log("mailto:?subject="+encodeURI(subject)+"&body=" + encodeURI(body));  
Output:
mailto:?subject=My%20subject%20with%20%E5%E4%F6&body=My%20body%20with%20more%20characters%20and%20swedish%20%E5%E4%F6
mailto:?subject=My%20subject%20with%20%C3%A5%C3%A4%C3%B6&body=My%20body%20with%20more%20characters%20and%20swedish%20%C3%A5%C3%A4%C3%B6
mailto:?subject=My%20subject%20with%20%C3%A5%C3%A4%C3%B6&body=My%20body%20with%20more%20characters%20and%20swedish%20%C3%A5%C3%A4%C3%B6 

Only the mailto link created with "escape" opens a properly formatted mail in Outlook using IE or Chrome. When using encodeURI or encodeURIComponent the subject says:

My subject with åäö

and the body is also looking messed up.

Is there some other function besides escape that I can use to get the working mailto link?

HoldOffHunger
  • 18,769
  • 10
  • 104
  • 133
gusjap
  • 2,397
  • 5
  • 24
  • 38
  • What encoding are you using, have you tried using `utf-8`? – Cyclonecode Oct 13 '14 at 14:20
  • I'm using UTF-8 encoding. – gusjap Oct 13 '14 at 14:28
  • 1
    I did notice now that escape is not working in Firefox, so I'll have to use encodeURIComponent in the Firefox case. Error in Firefox: _ERROR_ILLEGAL_VALUE: Component returned failure code: 0x80070057 (NS_ERROR_ILLEGAL_VALUE) [nsIDOMLocation.href] – gusjap Oct 13 '14 at 14:31
  • 2
    The best solution I've come up with is to define my own escape function instead of using the deprecated one. Here is one example of how the escape function could be implemented: http://cwestblog.com/2011/05/23/escape-unescape-deprecated/ – gusjap Oct 13 '14 at 15:03

4 Answers4

11

escape() is defined in section B.2.1.2 escape and the introduction text of Annex B says:

... All of the language features and behaviours specified in this annex have one or more undesirable characteristics and in the absence of legacy usage would be removed from this specification. ...

For characters, whose code unit value is 0xFF or less, escape() produces a two-digit escape sequence: %xx. This basically means, that escape() converts a string containing only characters from U+0000 to U+00FF to an percent-encoded string using the latin-1 encoding.

For characters with a greater code unit, the four-digit format %uxxxx is used. This is not allowed within the hfields section (where subject and body are stored) of an mailto:-URI (as defined in RFC6068):

mailtoURI    = "mailto:" [ to ] [ hfields ]
to           = addr-spec *("," addr-spec )
hfields      = "?" hfield *( "&" hfield )
hfield       = hfname "=" hfvalue
hfname       = *qchar
hfvalue      = *qchar
...
qchar        = unreserved / pct-encoded / some-delims
some-delims  = "!" / "$" / "'" / "(" / ")" / "*"
               / "+" / "," / ";" / ":" / "@"

unreserved and pct-encoded are defined in STD66:

unreserved  = ALPHA / DIGIT / "-" / "." / "_" / "~"
pct-encoded   = "%" HEXDIG HEXDIG

A percent sign is only allowed if it is directly followed by two hexdigits, percent followed by u is not allowed.

Using a self-implemented version, that behaves exactly like escape doesn't solve anything - instead just continue to use escape, it won't be removed anytime soon.



To summerise: Your previous usage of escape() generated latin1-percent-encoded mailto-URIs if all characters are in the range U+0000 to U+00FF, otherwise an invalid URI was generated (which might still be correctly interpreted by some applications, if they had javascript-encode/decode compatibility in mind).

It is more correct (no risk of creating invalid URIs) and future-proof, to generate UTF8-percent-encoded mailto-URIs using encodeURIComponent() (don't use encodeURI(), it does not escape ?, /, ...). RFC6068 requires usage of UTF-8 in many places (but allows other encodings for "MIME encoded words and for bodies in composed email messages").

Example:

text_latin1="Swedish åäö"
text_other="Emoji "

document.getElementById('escape-latin-1-link').href="mailto:?subject="+escape(text_latin1);
document.getElementById('escape-other-chars-link').href="mailto:?subject="+escape(text_other);
document.getElementById('utf8-link').href="mailto:?subject="+encodeURIComponent(text_latin1);
document.getElementById('utf8-other-chars-link').href="mailto:?subject="+encodeURIComponent(text_other);

function mime_word(text){
  q_encoded = encodeURIComponent(text) //to utf8 percent encoded
  .replace(/[_!'()*]/g, function(c){return '%'+c.charCodeAt(0).toString(16).toUpperCase();})// encode some more chars as utf8
  .replace(/%20/g,'_') // mime Q-encoding is using underscore as space
  .replace(/%/g,'='); //mime Q-encoding uses equal instead of percent
  return encodeURIComponent('=?utf-8?Q?'+q_encoded+'?=');//add mime word stuff and escape for uri
}

//don't use mime_word for body!!!
document.getElementById('mime-word-link').href="mailto:?subject="+mime_word(text_latin1);
document.getElementById('mime-word-other-chars-link').href="mailto:?subject="+mime_word(text_other);
<a id="escape-latin-1-link">escape()-latin1</a><br/>
<a id="escape-other-chars-link">escape()-emoji</a><br/>
<a id="utf8-link">utf8</a><br/>
<a id="utf8-other-chars-link">utf8-emoji</a><br/>
<a id="mime-word-link">mime-word</a><br/>
<a id="mime-word-other-chars-link">mime-word-emoji</a><br/>

For me, the UTF-8 links and the Mime-Word links work in Thunderbird. Only the plain UTF-8 links work in Windows 10 builtin Mailapp and my up-to-date version of Outlook.

Community
  • 1
  • 1
T S
  • 1,656
  • 18
  • 26
  • In a test of Chrome 74.0.3729.169 (64-bit) on Windows 10, only the UTF-8 links worked to produce the correct unicode characters. The other resulted in either undetermined characters (�) or just kept the string of %hex escape codes. – C Perkins Jun 06 '19 at 19:27
  • @CPerkins You specify the browser, you used to click on the links, but AFAIK the behavior is irrelevant of the browser. The result does however depend on the mail-client that is opened when interpreting `mailto:`-links. It would therefore be nice, if you could specify which mail-client is opened, when you click on the links. If the links open a new tab in your browser, that means a webmail service is registered as `mailto:`-handler. In that case, please specify, which webmail service is used. (gmail.com or outlook.com or roundcube or ... ?) – T S Jun 07 '19 at 10:05
  • I intended to mention that I was using a browser-based mail client, so in that case the browser's behavior and initial interpretation of the URI is definitely relevant. To explain further, the mailto URIs were produced and clicked in Firefox 67 which opened Chrome. The browser is not irrelevant anyway, since this is all about producing the correct link within the browser, so between producing the URI to passing it off to the mailto client, I suppose there is a dependence on the browser there too. FYI, I confirmed that Thunderbird correctly interprets the UTF-8 and mime-word URIs. – C Perkins Jun 07 '19 at 16:50
5

To quote the MDN Documentation directly...

This function was used mostly for URL queries (the part of a URL following ?)—not for escaping ordinary String literals, which use the format "\xHH". (HH are two hexadecimal digits, and the form \xHH\xHH is used for higher-plane Unicode characters.)

The problem you are experiencing is because escape() does not support the UTF-8 while encodeURI() and encodeURIComponent() do.

But to be absolutely clear: never use encodeURI() or encodeURIComponent(). Let's just try it out:

console.log(encodeURIComponent('@#*'));

Input: @#*. Output: %40%23*. Ordinarily, once user input is cleansed, I feel like I can trust that cleansed input. But if I ran rm * on my Linux system to delete a file specified by a user, that would literally delete all files on my system, even if I did the encoding 100% completely server-side. This is a massive bug in encodeURI() and encodeURIComponent(), which MDN Web docs clearly point with a solution.

Use fixedEncodeURI(), when trying to encode a complete URL (i.e., all of example.com?arg=val), as defined and further explained at the MDN encodeURI() Documentation...

function fixedEncodeURI(str) {
   return encodeURI(str).replace(/%5B/g, '[').replace(/%5D/g, ']');
}

Or, you may need to use use fixedEncodeURIComponent(), when trying to encode part of a URL (i.e., the arg or the val in example.com?arg=val), as defined and further explained at the MDN encodeURIComponent() Documentation...

function fixedEncodeURIComponent(str) {
 return encodeURIComponent(str).replace(/[!'()*]/g, function(c) {
   return '%' + c.charCodeAt(0).toString(16);
 });
}

If you are having trouble distinguishing what fixedEncodeURI(), fixedEncodeURIComponent(), and escape() do, I always like to simplify it with:

  • fixedEncodeURI() : will not encode +@?=:#;,$& to their http-encoded equivalents (as & and + are common URL operators)
  • fixedEncodeURIComponent() will encode +@?=:#;,$& to their http-encoded equivalents.
HoldOffHunger
  • 18,769
  • 10
  • 104
  • 133
  • encodeURI is not a function to escape characters for usage in a shell (and neither was escape), it is used to encode data into a format suitable for usage as a URI parameter. Your "fixedEncodeURI" doesn't change anything about that either. – T S Jul 19 '22 at 19:51
0

Maybe useful:

/**
 * Escape non printable char. eg: "\x08", "\xAAAA", "\r" or "A"
 * @param {number} code 
 * @returns {string}
 */
function escape(code) {
    if (code > 0x7f) {
        return `\\x${code.toString(16).padStart(2, '0')}`
    }
    const char = String.fromCharCode(code)
    const mayEscaped = JSON.stringify(char)
    if (mayEscaped != `"${char}"`) {
        if (mayEscaped.length == 4) {
            // escaped "\r"
            return mayEscaped.slice(1, 3)
        }
        // escaped "\u0000"
        return '\\x' + mayEscaped.slice(code > 0xff ? 3 : 5, 7)
    } else {
        return char
    }
}

Mochamad Arifin
  • 418
  • 5
  • 9
-7

The escape() function was deprecated in JavaScript version 1.5. Use encodeURI() or encodeURIComponent() instead.

example

string:            "May/June 2016, Volume 72, Issue 3"
escape:            "May/June%202016%2C%20Volume%2072%2C%20Issue%203"
encodeURI:         "May/June%202016,%20Volume%2072,%20Issue%203"
encodeURIComponent:"May%2FJune%202016%2C%20Volume%2072%2C%20Issue%203"

source https://www.w3schools.com/jsref/jsref_escape.asp

Basheer AL-MOMANI
  • 14,473
  • 9
  • 96
  • 92
  • 6
    Those two functions don't do the same thing as escape. The question even states this. Also, W3 schools is not a great resource to link to. – Ralph King Mar 29 '18 at 10:13
  • `s = 'a?b/ce"f\'g'; console.log(escape(s), encodeURI(s), encodeURIComponent(s)) #=> a%3Fb/c%3Cd%3Ee%22f%27g a?b/c%3Cd%3Ee%22f'g a%3Fb%2Fc%3Cd%3Ee%22f'g`, only escape protects against XSS correctly (not possible to escape a quotes attribute) – localhostdotdev Apr 12 '19 at 12:53