Questions tagged [codepoint]

A CodePoint is a numeric value that make up the unicode codespace.

CodePoint may represents a character or also have other meanings (seven fundamental classes of code points in the standard: Graphic, Format, Control, Private-Use, Surrogate, Noncharacter, Reserved).

Related links

Related tags

116 questions
2
votes
0 answers

How to convert unicode emoji into hex codepoint (with multiple groups)

I'm building an application that converts emoji shortnames (like :flag_cf:) and converts them through a series of operations into a hex codepoint (which are the keys in a map to return Twitter emoji/twemoji). I have a utility…
zahabba
  • 2,921
  • 5
  • 20
  • 33
2
votes
1 answer

Check whether a Unicode code point is assigned

Go has the unicode package, containing useful functions such as IsGraphic or IsPrint. One function that is missing though is IsAssigned. Of course I could write my own function by using the other functions. But I would rather expect the standard…
Roland Illig
  • 40,703
  • 10
  • 88
  • 121
2
votes
1 answer

Unicode Codepoints for special characters in MS Keyboard Layout Creator

My goal: I am trying to get the MS Keyboard Layout Creator to allow me to perform a carriage return/enter whenever I hit the [R-Arrow] key in combination with the [Control] key, but still have the [R-Arrow] key perform as normal (i.e. move one…
2
votes
1 answer

How to get unicode code point(s) representation of character/string in Swift?

As a generic solution, how can we get the unicode code point/s for a character or a string in Swift? Consider the following: let A: Character = "A" // "\u{0041}" let Á: Character = "Á" // "\u{0041}\u{0301}" let sparklingHeart = "" //…
Ahmad F
  • 30,560
  • 17
  • 97
  • 143
2
votes
1 answer

How to substring a String containing 4 bytes characters?

I have a String that could contain 4 bytes characters. For example: String s = "\uD83D\uDC4D1234\uD83D\uDC4D"; I also have a size that I should use to get a substring from it. The size is in characters. So let's say that size is 5, so I should get…
Federico Pugnali
  • 655
  • 8
  • 18
2
votes
3 answers

How to convert an accented character in an unicode string to its unicode character code using Python?

Just wonder how to convert a unicode string like u'é' to its unicode character code u'\xe9'?
boativan66
  • 23
  • 3
2
votes
2 answers

Are surrogate pairs the only way to represent code points larger than 2 bytes in UTF-16?

I know that this is probably a stupid question, but I need to be sure on this issue. So I need to know for example if a programming language says that its String type uses UTF-16 encoding, does that mean: it will use 2 bytes for code points in the…
user4344762
2
votes
4 answers

How to find if a character belongs to a particular codepage using c++ or calling winapi

How can we find if a character belongs to a particular codepage? or How can we determine whether a charcter fits into currently active IME for an application.
Prakash
  • 742
  • 7
  • 19
2
votes
2 answers

python `os` returning files that `os` thinks doesn't exist

I have a collection of files from an older MAC OS file store. I know that there are filename / path name issues with the collection. The issue stems from the inclusion of a codepoint in the path that I think was rendered as a dash in the original…
Jay Gattuso
  • 3,890
  • 12
  • 37
  • 51
2
votes
0 answers

What is the purpose of Default Ignorable Code Points in font rendering?

According to this link, U+00AD is a default ignorable code point. What exactly is the purpose of these default ignorable code points? What is the harm if I want to render U+00AD ? The link provides some information, but I am not really…
user1414696
  • 307
  • 4
  • 15
2
votes
1 answer

Convert codepoint to wchar_t in C

If I know the unicode codepoint of this 2 chinese character 你好 in str How can I convert this char * str codepoint to chinese character and assign it to wchar_t * wstr ? char * str = "4F60 597D"; wchar_t * wstr; I know that I can directly assign…
William
  • 5,526
  • 6
  • 20
  • 42
2
votes
2 answers

How make QChar.unicode() report the utf-16 representation of combined characters?

I'm trying to write a codec for Code page 437. My plan was to just pass the ASCII characters through and map the remaining 128 characters in a table, using the utf-16 value as key. For some combined charaters (letters with dots, tildes etcetera),…
Daniel Näslund
  • 2,300
  • 3
  • 22
  • 27
1
vote
0 answers

How to convert utf8 string to unicode code point in PHP?

Possible Duplicate: UTF-8 to Unicode Code Points UTF-8 strings are to be converted to Unicode code points. How to convert utf8 string to its corresponding unicode code point?
tuxnani
  • 3,634
  • 6
  • 21
  • 33
1
vote
1 answer

Unicode to CodePoint C++

How can I get the codepoint from a Unicode value? According the character code table, the Code Point for the pictogram '丂' is 8140, and the Unicode is \u4E02 I made this app on C++, to try to get the CP for a Unicode string value: #include…
Ferrus
  • 15
  • 6
1
vote
2 answers

Are there examples of ISO 8859-1 text files which are valid, but different in UTF-8?

I know that UTF-8 supports way more characters than Latin-1 (even with the extensions). But are there examples of files that are valid in both, but the characters are different? So essentially that the content changes, depending on how you think the…
Martin Thoma
  • 124,992
  • 159
  • 614
  • 958