using unicode in Javascript

Question

In JavaScript we can use the below line of code(which uses Unicode) for displaying copyright symbol:

var x = "\u00A9 RPeripherals";

Why can't we type the copyright symbol directly using ALT code (alt+0169) like below :

var x = "© RPeripherals" ;

What is the difference between these two methods?

the js string parser will see that \u00a9 and convert to the equivalent character in the font you're using (no guarantees that 00a9 will always be a copyright char, though). At least leaving it as a textual `\u00a9` is less likely to be mangled if you use a different charset. None of the constituent characters are "mangleable" if there's a charset mismatch somewhre, while embedding the actual value represented by that escape sequence IS easily mangleable. — Marc B, Sep 30 '12 at 18:39

score 2 · Accepted Answer · 2012-10-01T09:48:33.497

Why can't we type the copyright symbol directly using ALT code (alt+0169) like below :

Who says so? Of course you can. Just configure your code editor to use UTF-8 encoding for source files. You should never use anything else to begin with...

What is the difference between these two methods?

The difference is that using the \uXXXX scheme you are transmitting at best 2 and at worst 5 extra bytes on the wire. This kind of spelling may help if you need to embed characters in your source code, which your font cannot display properly. For example, I don't have traditional Chinese characters in the font I'm using for programming, so if I type Chinese characters into my code editor, I'll see a bunch of question marks or rectangles with Unicode codepoint digits instead of actual characters. But someone who has Chinese glyphs in the font wouldn't have that problem.

If me and that person want to share our source code, it would be preferable that the other person uses \uXXXX scheme, as I would be able to verify which character is that by looking it up in the Unicode table. That's about all the difference.

EDIT

ECMAScript standard (v 262/5.1) says specifically that

A conforming implementation of this Standard shall interpret characters in conformance with the Unicode Standard, Version 3.0 or later and ISO/IEC 10646-1 with either UCS-2 or UTF-16 as the adopted encoding form, implementation level 3. If the adopted ISO/IEC 10646-1 subset is not otherwise specified, it is presumed to be the BMP subset, collection 300. If the adopted encoding form is not otherwise specified, it presumed to be the UTF-16 encoding form.

So, the standard guarantees that character encoding is Unicode, and enforces the use of UTF-16 (that's strange, I thought it was UTF-8), but I don't think that this is what happens in practice... I believe that browsers use UTF-8 as default. Perhaps this have changed in the later standards, but this is the one last universally accepted.

Thanks...So that means we should always use Unicode to avoid problems.Right ? — Rohit P, Oct 01 '12 at 06:16
I'm not entirely sure of what I'am saying so please correct me if I'm wrong, but as far as I know encoding your file using an encoding bigger than standard ASCII (1 byte p/c, 2 in case of extended) will result only in a heavier file to transmit since now EVERY character will use a larger amount of bytes (4 for unicode), therefor it would be better to use the \uXXXX notation and let the client computer to determine that the next four bytes in the stream represent a single character than turning all of our characters into 4 bytes. — Ordiel, Oct 27 '14 at 02:24
Besides consider readability since not all applications can read those characters, it would be better to get a character code than a square when trying to determine what the developer wanted to print. — Ordiel, Oct 27 '14 at 02:25

Quentin · Answer 2 · 2014-04-25T13:20:52.747

0

Why can't we directly type the copyright symbol directly

Because JavaScript engines are capable of parsing UTF-8 encoded source files.

What is the difference between these two methods?

One is short, requires the source file be encoded in an encoding that supports the character, and requires that you type a character that isn't printed on the keyboard's buttons.

The other is (comparatively) long, can be expressed entirely in ASCII, and can be typed with characters printed on the buttons of a standard keyboard.

edited Apr 25 '14 at 13:20

answered Sep 30 '12 at 18:38

Quentin

914,110
126
1,211
1,335

Another difference not mentioned: If you Unicode-escape a character like a newline, it will not terminate a string literal as if you had typed a newline directly. See https://developer.mozilla.org/en-US/docs/JavaScript/Guide/Values,_variables,_and_literals#Unicode for further explanation. – jonvuri Sep 30 '12 at 18:43
1

the person asked "Why can't we type" not "Why can we". – Sam YC Apr 25 '14 at 13:18

using unicode in Javascript

2 Answers2

Linked

Related