A grapheme is a unit of writing, generally smaller than a word. In an ideographic language, a single grapheme may carry considerable meaning, but many languages use only a smaller alphabet where a few different graphemes are arranged in various ways to build units of meaning.
Questions tagged [grapheme]
31 questions
2
votes
2 answers
Grapheme support in python regex
I'm using the awesome regex module, trying its \X grapheme support.
First, I try with the plain old .
>>> print regex.match('.', 'Ä').group(0)
>>> print regex.match('..', 'Ä').group(0)
Ä
It went as expected. Move on to \X
>>> print…

Ron
- 7,588
- 11
- 38
- 42
1
vote
1 answer
Is there a list of all (or most of the) possible combinations of (extended) grapheme clusters?
So, I'm looking into coding some binary data as text, using single graphical units, after using already all the available ones for Java and UTF16, I'd like to expand my "pool".
I recently discovered grapheme clusters, where you can combine different…

elect
- 6,765
- 10
- 53
- 119
1
vote
1 answer
How to iterate over grapheme clusters in Crystal?
The Unicode standard defines a grapheme cluster as an algorithmic approximation to a "user-perceived character". A grapheme cluster more or less corresponds to what people think of as a single "character" in text. Therefore it is a natural and…

shadowtalker
- 12,529
- 3
- 53
- 96
1
vote
0 answers
Unicode GraphemeBreakProperty spec including extra characters?
I was looking at the Unicode GraphemeBreakProperty spec and according to the table specified in Unicode Standard Annex #29 the Prepend property should include all code points with Indic_Syllabic_Category = Consonant_Preceding_Repha or…

Sammcb
- 113
- 2
- 8
1
vote
1 answer
regular expression to match name initials - PCRE
I have a regular expression to get the initials of a name like below:
/\b\p{L}\./gu
it works fine with English and other languages until there are graphemes and combined charecters occur.
Like
क in Hindi and
ಕ in Kannada
are being matched
But,
के…

Prashanth Benny
- 1,523
- 21
- 33
1
vote
1 answer
Is the set of distinct graphemes infinite?
Is there any limit to the number of distinct graphemes that can be represented with a Unicode encoding such as UTF-8? Does, for example, the Unicode standard restrict the number of consecutive combining characters?

Anthony Faull
- 17,549
- 5
- 55
- 73
0
votes
2 answers
Are all "non-grapheme" code points invisible?
In a unicode string, each grapheme consists of one or more code points. However, there are some code points, such as the Zero-width joiner (ZWJ), which are never a part of a grapheme. The ZWJ is, in itself, invisible. Are all of those "non-grapheme"…

at54321
- 8,726
- 26
- 46
0
votes
0 answers
get unicode graphemes as unsplitted item with python2.7
any idea, if it is possible with regex (python 2.7) to get uniq chars unspitted into surrogate pairs for unicode graphemes?
According This Example this is possible with python 3.x.
See here:
>>> import regex
>>> s = ''
>>> for c in…

Egor Savin
- 39
- 7
0
votes
0 answers
Can I override the default rendering of certain graphemes with diacritics?
I'm looking to add two U+0307 COMBINING DOT ABOVE characters to a Cyrillic letter. When added to Latin letters, the dot diacritics stack, but with Cyrillic letters, the first dot is combined with the letter but the second is rendered to the side, as…

Ray Toal
- 86,166
- 18
- 182
- 232
0
votes
1 answer
how to formulate english graphemes from a string in Matlab by reducing time complexity?
I've been working in the grapheme to phoneme conversion in Matlab and trying to produce a more generalized code to first break the word into the particular consonents,digraphs and their related vowels and segment each inputted string (word) into its…

Meraki
- 33
- 1
- 5
0
votes
1 answer
convert similar sound word parts
I'm having trouble searching for the right terms here to solve the below problem; I'm sure it's a done thing, I just can't find the right terms to express the problem!
I'm basically trying to create a classifier that will take word comparison…

Manish Patel
- 4,411
- 4
- 25
- 48
0
votes
1 answer
grapheme š is always bold
i'm fighting with the font-style of š since hours.
i'm using the webfont "open sans" from google webfonts and tested the grapheme on googles review option. everything's fine, the š is thin and beautiful like the rest of the font. (sorry i cant post…

morkro
- 4,336
- 5
- 25
- 35
-1
votes
1 answer
Is there a way to map english letter(s) (or graphemes) in word from correspondent phoneme(s) in Python?
e.g. let's assume we have something like:
WOULD | YOU | LIKE | A | CUP | OF | TEA
w ʊ d | j uː | l a ɪ k | ə | k ʌ p | ʊ v | t iː
W UH D | Y UW | L AY K | AH | K AH P | AH V | T IY
And besides that I need to solve P2G problem, I also…

Ivan Kot
- 1
- 1
-1
votes
1 answer
Grapheme search in Java
So i am working on a project which involves searching of a word in different languages. I can easily get the Locale of the language but i dont know how to search for the word in another language. So the text can be in Chinese and the word to be…

Rohan
- 673
- 6
- 15
-1
votes
1 answer
How do I identify which letter of the alphabet a word starts with in Objective-C?
Given a string, I'm trying to determine which letter of the alphabet it belongs to. For example, "apple" goes into the "A" section. "Banana" goes into the "B" section. I'm using this to identify the section:
NSRange range = [string…

Hilton Campbell
- 6,065
- 3
- 47
- 79