1

I am having a text processing trouble.

(A) 'khách hàng'

(B) 'khách hàng'

A and B are the same but they're not equals in code. They look like the same, right?

You can see my problem by press F12 > Console > paste 'khách hàng' === 'khách hàng' > Enter

May I convert A and B to the same encoding? And How?

Thanks!

Anh Tuan
  • 41
  • 4
  • 1
    Possible duplicate of [How do I check equality of Unicode strings in Javascript?](https://stackoverflow.com/questions/7097867/how-do-i-check-equality-of-unicode-strings-in-javascript) – Biffen Jun 28 '19 at 08:20

1 Answers1

2

Yes, the two strings (A) and (B) make use of a different normalization form: NFC (Canonical Composition) for (A), NFD (Canonical Decomposition) for (B).

(A) khách hàng: U+006B U+0068 U+00E1 U+0063 U+0068 U+0020 U+0068 U+00E0 U+006E U+0067

(B) khách hàng: U+006B U+0068 U+0061 U+0301 U+0063 U+0068 U+0020 U+0068 U+0061 U+0300 U+006E U+0067

In order to compare them for equality, both strings must be normalized first to the same form; in JavaScript, this can be achieved through the normalize() method:

let A = 'khách hàng';
let B = 'khách hàng';
console.log (A === B); // -> false
console.log (A.normalize ('NFC') === B.normalize ('NFC')); // -> true
console.log (A.normalize ('NFD') === B.normalize ('NFD')); // -> true

Warning: depending on your web browser, especially in Firefox or Safari, copying or pasting the string (B) may result in string (A); it seems that an unexpected normalization step is performed "behind the scenes"...

Norman
  • 36
  • 1