Questions tagged [grapheme]

A grapheme is a unit of writing, generally smaller than a word. In an ideographic language, a single grapheme may carry considerable meaning, but many languages use only a smaller alphabet where a few different graphemes are arranged in various ways to build units of meaning.

31 questions
2
votes
2 answers

Grapheme support in python regex

I'm using the awesome regex module, trying its \X grapheme support. First, I try with the plain old . >>> print regex.match('.', 'Ä').group(0) >>> print regex.match('..', 'Ä').group(0) Ä It went as expected. Move on to \X >>> print…
Ron
  • 7,588
  • 11
  • 38
  • 42
1
vote
1 answer

Is there a list of all (or most of the) possible combinations of (extended) grapheme clusters?

So, I'm looking into coding some binary data as text, using single graphical units, after using already all the available ones for Java and UTF16, I'd like to expand my "pool". I recently discovered grapheme clusters, where you can combine different…
elect
  • 6,765
  • 10
  • 53
  • 119
1
vote
1 answer

How to iterate over grapheme clusters in Crystal?

The Unicode standard defines a grapheme cluster as an algorithmic approximation to a "user-perceived character". A grapheme cluster more or less corresponds to what people think of as a single "character" in text. Therefore it is a natural and…
shadowtalker
  • 12,529
  • 3
  • 53
  • 96
1
vote
0 answers

Unicode GraphemeBreakProperty spec including extra characters?

I was looking at the Unicode GraphemeBreakProperty spec and according to the table specified in Unicode Standard Annex #29 the Prepend property should include all code points with Indic_Syllabic_Category = Consonant_Preceding_Repha or…
Sammcb
  • 113
  • 2
  • 8
1
vote
1 answer

regular expression to match name initials - PCRE

I have a regular expression to get the initials of a name like below: /\b\p{L}\./gu it works fine with English and other languages until there are graphemes and combined charecters occur. Like क in Hindi and ಕ in Kannada are being matched But, के…
Prashanth Benny
  • 1,523
  • 21
  • 33
1
vote
1 answer

Is the set of distinct graphemes infinite?

Is there any limit to the number of distinct graphemes that can be represented with a Unicode encoding such as UTF-8? Does, for example, the Unicode standard restrict the number of consecutive combining characters?
Anthony Faull
  • 17,549
  • 5
  • 55
  • 73
0
votes
2 answers

Are all "non-grapheme" code points invisible?

In a unicode string, each grapheme consists of one or more code points. However, there are some code points, such as the Zero-width joiner (ZWJ), which are never a part of a grapheme. The ZWJ is, in itself, invisible. Are all of those "non-grapheme"…
at54321
  • 8,726
  • 26
  • 46
0
votes
0 answers

get unicode graphemes as unsplitted item with python2.7

any idea, if it is possible with regex (python 2.7) to get uniq chars unspitted into surrogate pairs for unicode graphemes? According This Example this is possible with python 3.x. See here: >>> import regex >>> s = '‍‍‍' >>> for c in…
0
votes
0 answers

Can I override the default rendering of certain graphemes with diacritics?

I'm looking to add two U+0307 COMBINING DOT ABOVE characters to a Cyrillic letter. When added to Latin letters, the dot diacritics stack, but with Cyrillic letters, the first dot is combined with the letter but the second is rendered to the side, as…
Ray Toal
  • 86,166
  • 18
  • 182
  • 232
0
votes
1 answer

how to formulate english graphemes from a string in Matlab by reducing time complexity?

I've been working in the grapheme to phoneme conversion in Matlab and trying to produce a more generalized code to first break the word into the particular consonents,digraphs and their related vowels and segment each inputted string (word) into its…
Meraki
  • 33
  • 1
  • 5
0
votes
1 answer

convert similar sound word parts

I'm having trouble searching for the right terms here to solve the below problem; I'm sure it's a done thing, I just can't find the right terms to express the problem! I'm basically trying to create a classifier that will take word comparison…
Manish Patel
  • 4,411
  • 4
  • 25
  • 48
0
votes
1 answer

grapheme š is always bold

i'm fighting with the font-style of š since hours. i'm using the webfont "open sans" from google webfonts and tested the grapheme on googles review option. everything's fine, the š is thin and beautiful like the rest of the font. (sorry i cant post…
morkro
  • 4,336
  • 5
  • 25
  • 35
-1
votes
1 answer

Is there a way to map english letter(s) (or graphemes) in word from correspondent phoneme(s) in Python?

e.g. let's assume we have something like: WOULD | YOU | LIKE | A | CUP | OF | TEA w ʊ d | j uː | l a ɪ k | ə | k ʌ p | ʊ v | t iː W UH D | Y UW | L AY K | AH | K AH P | AH V | T IY And besides that I need to solve P2G problem, I also…
Ivan Kot
  • 1
  • 1
-1
votes
1 answer

Grapheme search in Java

So i am working on a project which involves searching of a word in different languages. I can easily get the Locale of the language but i dont know how to search for the word in another language. So the text can be in Chinese and the word to be…
Rohan
  • 673
  • 6
  • 15
-1
votes
1 answer

How do I identify which letter of the alphabet a word starts with in Objective-C?

Given a string, I'm trying to determine which letter of the alphabet it belongs to. For example, "apple" goes into the "A" section. "Banana" goes into the "B" section. I'm using this to identify the section: NSRange range = [string…
Hilton Campbell
  • 6,065
  • 3
  • 47
  • 79