Questions tagged [codepoint]

A CodePoint is a numeric value that make up the unicode codespace.

CodePoint may represents a character or also have other meanings (seven fundamental classes of code points in the standard: Graphic, Format, Control, Private-Use, Surrogate, Noncharacter, Reserved).

Related links

Related tags

116 questions
1
vote
1 answer

How to sort strings in JavaScript by code point values?

I need to sort an array of strings, where elements are compared lexicographically as sequences of code point values, so that, for example, "Z" < "a" < "\udabc" < "�" < "". Is there a more efficient way of comparing strings, other than manually…
abacabadabacaba
  • 2,662
  • 1
  • 13
  • 18
1
vote
0 answers

Get first printable character from a string

This might seem like an already answered question, but I couldn't find it anywhere. How do I get the first printable character in Java? For example abcd //should return "a" - The first printable char is of 1 bytes //should return "" - The…
Pankaj Singhal
  • 15,283
  • 9
  • 47
  • 86
1
vote
1 answer

Reading Glyphs from a String using codePointAt(i) or Charseterset issue

I created a text editor for JavaFx which is painting the text on a Canvas, gyph by glyph. I use String.codePointAt(i) to correctly load the glyphs. Somehow the first glyph is a strange one, I don't know why. The file was loaded using Charset UTF-16…
DbSchema
  • 413
  • 5
  • 16
1
vote
1 answer

In java what's different between Character.isBmpCodePoint and Character.isValidCodePoint

In java what's different between Character.isBmpCodePoint and Character.isValidCodePoint? I mean, I know 0x10FFFF and 0xFFFF, but what does it imply? Which should I use?
FredSuvn
  • 1,869
  • 2
  • 12
  • 19
1
vote
0 answers

Why are codepoints in the block CJK UNIFIED IDEOGRAPHS EXTENSION B not named according to the group pattern

In the Java standard library, Character.getName(0x2000A) returns "CJK UNIFIED IDEOGRAPHS EXTENSION B 2000A" (in java 11, 16 and 17, using unicode version 10 and unicode version 13), while I expected "CJK UNIFIED IDEOGRAPHS-2000A" The result…
Martijn
  • 11,964
  • 12
  • 50
  • 96
1
vote
1 answer

Build a token for Simplified Chinese Identifiers

I'm trying to build a token for Simplified Chinese Identifiers. Simplified Chinese Identifiers are defined in the specification as follows: simplified-Chinese-identifier = first-sChinese-identifier-character…
SoftTimur
  • 5,630
  • 38
  • 140
  • 292
1
vote
1 answer

How do I reverse `String.fromCodePoint`, i.e. convert a string to an array of code points?

String.fromCodePoint(...[127482, 127480]) gives me a flag of the US (). How do I turn the flag back to [127482, 127480]?
ppt
  • 946
  • 8
  • 18
1
vote
0 answers

why Unicode codepoint escape syntax doesn't work in php

i am confuse about Unicode codepoint escape syntax. here is a demo //this work fine echo "\u{1f602}"; // echoes //this doesn't work $var = '1f602'; echo '"\u{' . $var . '}"';// out put \u1f602 after i search. i find eval will let it work…
1
vote
1 answer

Character Issues

Back Story I basically retrieve strings from a database. I alter some text or those strings. Then I upload those strings back to the database, replacing the original strings. After looking at the front-end that displays those strings, I noticed the…
SedJ601
  • 12,173
  • 3
  • 41
  • 59
1
vote
2 answers

java string unicode code point convert to character

Ok, so I feel like this question for asked many times but I am not able to find an answer. I am comparing two different files that were generated by two different programs. Of course both programs are generating the files from the same db queries. I…
Mohamed Nuur
  • 5,536
  • 6
  • 39
  • 55
1
vote
1 answer

"Width" of character on screen

I'm using Ncurses to write text editor. I would like to know if there is a way to determine how many different characters can be placed on screen, where each of the character is encoded with UTF-8. For example when I get screen width of 10 and one…
Mateusz Wojtczak
  • 1,621
  • 1
  • 12
  • 28
1
vote
1 answer

Codepoint mismatch between Java and C

So, I'm having some problems with the following char – in a port of imgui to kotlin After digging the whole day into Charsets and encodings, I came down to my only hope: rely on the unicode codepoints. That char on the jvm "–"[0].toInt() // same as…
elect
  • 6,765
  • 10
  • 53
  • 119
1
vote
1 answer

How can I tell if a Unicode code point is one complete printable glyph(or grapheme cluster)?

Let's say there's a Unicode String object, and I want to print each Unicode character in that String one by one. In my simple test with very limited languages, I could successively achieve this just assuming one code point is always the same as one…
Jenix
  • 2,996
  • 2
  • 29
  • 58
1
vote
1 answer

Efficient lookup table for unicode code points

Wondering how typically a unicode code point lookup table is done. That is, given a character such as a, return U+24B6, or vice versa. Wondering if there are any efficient tricks so that it doesn't just boil down to: a: U+24B6 b: ... c: ... Which…
Lance
  • 75,200
  • 93
  • 289
  • 503
1
vote
3 answers

What is the most idiomatic way to convert a string to characters in Erlang?

What is the most idiomatic way to convert this: "helloworld" to ["h","e","l","l","o","w","o","r","l","d"] in Erlang ?
Muhammad Lukman Low
  • 8,177
  • 11
  • 44
  • 54