Questions tagged [codepoint]

A CodePoint is a numeric value that make up the unicode codespace.

CodePoint may represents a character or also have other meanings (seven fundamental classes of code points in the standard: Graphic, Format, Control, Private-Use, Surrogate, Noncharacter, Reserved).

Related links

Related tags

116 questions
8
votes
2 answers

Writing a better natural sort (than mine)

I added an answer to this question here: Sorting List in C# which calls for a natural sort order, one that handles embedded numbers. My implementation, however, is naive, and in lieu of all the posts out there about how applications doesn't…
Lasse V. Karlsen
  • 380,855
  • 102
  • 628
  • 825
8
votes
2 answers

How to read non-BMP (astral) Unicode supplementary characters (code points)

The G-Clef (U+1D11E) is not part of the Basic Multilingual Plane (BMP), which means that it requires more than 16 bit. Almost all of Java's read functions return only a char or a int containing also only 16 bit. Which function reads complete Unicode…
ceving
  • 21,900
  • 13
  • 104
  • 178
7
votes
3 answers

Convert from hex character to Unicode character in python

The hex string '\xd3' can also be represented as: Ó. The easiest way I've found to get the character representation of the hex string to the console is: print unichr(ord('\xd3')) Or in English, convert the hex string to a number, then convert that…
Kevin Burke
  • 61,194
  • 76
  • 188
  • 305
7
votes
3 answers

Retrieve Unicode code points > U+FFFF from QChar

I have an application that is supposed to deal with all kinds of characters and at some point display information about them. I use Qt and its inherent Unicode support in QChar, QString etc. Now I need the code point of a QChar in order to look up…
Sebastian Negraszus
  • 11,915
  • 7
  • 43
  • 70
7
votes
4 answers

How can I convert a Unicode codepoint (\uXXXX) into a character in Perl?

I have some unicode codepoints (\u5315\u4e03\u58ec\u4e8c\u4e0a\u53b6\u4e4b), which I have to convert into actual characters they represent. What's the simplest way to do so?
Peterim
  • 1,029
  • 4
  • 16
  • 25
6
votes
4 answers

Java unicode where to find example N-byte unicode characters

I'm looking for sample 1-byte, 2-byte, 3-byte, 4-byte, 5-byte, and 6-byte unicode characters. Any links to some sort of reference of all the different unicode characters out there and how big they are (byte-wise) would be greatly appreciated. I'm…
Mohamed Nuur
  • 5,536
  • 6
  • 39
  • 55
6
votes
4 answers

Convert UTF8 string into numeric values in Perl

For example, my $str = '中國c'; # Chinese language of china I want to print out the numeric values 20013,22283,99
Howard
  • 19,215
  • 35
  • 112
  • 184
6
votes
5 answers

Why are there duplicate characters in Unicode?

I can see some duplicate characters in Unicode. For example, the character 'C' can be represented by the code points U+0043 and U+0421. Why is this so?
Sirish
  • 9,183
  • 22
  • 72
  • 107
5
votes
2 answers

Comparing characters in Rebol 3

I am trying to compare characters to see if they match. I can't figure out why it doesn't work. I'm expecting true on the output, but I'm getting false. character: "a" word: "aardvark" (first word) = character ; expecting true, getting false
beeflobill
  • 317
  • 1
  • 10
5
votes
6 answers

What do these Unicode characters (codepoints) mean in this regex?

I have the following regular expression : I figured out most of the part which is as follows : ValidationExpression="^[\u0020\u0027\u002C\u002D\u0030-\u0039\u0041-\u005A\u005F\u0061-\u007A\u00C0-\u00FF°./]{1,256}$" u0020 : SPACE u0027 :…
Murtaza Mandvi
  • 10,708
  • 23
  • 74
  • 109
4
votes
2 answers

How do you turn an Array of codepoints (Int32) to a string?

In Crystal, a String can be turned into an Array(Int32) of codepoints: "abc".codepoints # [97,98,99] Is there a way to turn the Array back into a String?
dgo.a
  • 2,634
  • 23
  • 35
4
votes
3 answers

What is the propper way to get a char's code point?

I need to do some stuff with codepoints and a newline. I have a function that takes a char's codepoint, and if it is \r it needs to behave differently. I've got this: if (codePoint == Character.codePointAt(new char[] {'\r'}, 0)) { but that is…
Pokechu22
  • 4,984
  • 9
  • 37
  • 62
4
votes
1 answer

Haskell: convert unicode integer to actual unicode character

Suppose that my Haskell function is given an input, which is supposed to be the number of a unicode code point. How can one convert this to the corresponding character? Example: 123 to '{'.
Derek Thurn
  • 14,953
  • 9
  • 42
  • 64
3
votes
1 answer

Count codepoints in a string in Elixir

The String.length/1 function returns the number of graphemes in a UTF-8 binary. If I want to know how many Unicode codepoints are in the string, I know I can do: string |> String.codepoints |> length But this produces an unnecessary intermediate…
Adam Millerchip
  • 20,844
  • 5
  • 51
  • 74
3
votes
1 answer

How to convert a string representation of unicode hex "0x20000" to the int code point 0x20000 in Java

I have a list of String representations of unicode hex values such as "0x20000" () and "0x00F8" (ø) that I need to get the int code point of so that I can use functions such as: char[] chars = Character.toChars(0x20000); This should cover the BMP…
syzygy
  • 33
  • 1
  • 3