0

In ColdFusion I can determine the ASCII value of character by using asc()

How do I determine the UTF-8 value of a character?

James A Mohler
  • 11,060
  • 15
  • 46
  • 72
  • 2
    Do you mean that you have the UTF character in your editor and you want to get the UTF-8 representation of it, as in `U+00A2`? Have you looked at `CharsetDecode()` https://helpx.adobe.com/coldfusion/cfml-reference/coldfusion-functions/functions-c-d/CharsetDecode.html – Redtopia Nov 27 '18 at 20:30

1 Answers1

2
<cfscript>

    x = "漢"; // 3 bytes

    // bytes of unicode character, a.k.a. String.getBytes("UTF-8")
    bytes = charsetDecode(x, "UTF-8");
    writeDump(bytes); // -26-68-94

    // convert the 3 bytes to Hex
    hex = binaryEncode(bytes, "HEX");
    writeDump(hex); // E6BCA2

    // convert the Hex to Dec
    dec = inputBaseN(hex, 16);
    writeDump(dec); // 15121570

    // asc() uses the UCS-2 representation: 漢 = Hex 6F22 = Dec 28450
    asc = asc(x);
    writeDump(asc); // 28450

</cfscript>

USC-2 is fixed to 2 bytes, so it cannot support all unicode characters (as there can be as much as 4 bytes per character). But what are you actually trying to achieve here?

Note: If you run this example and get more than 3 bytes returned, make sure CF picks up the file as UTF-8 (with BOM).

Alex
  • 7,743
  • 1
  • 18
  • 38