Questions tagged [diacritics]

A Diacritic is "a mark near or through an orthographic or phonetic character or combination of characters indicating a phonetic value different from that given the unmarked or otherwise marked element" -- Merriam-Webster

From Wikipedia:

A diacritic (/daɪ.əˈkrɪtɨk/; also diacritical mark, diacritical point, diacritical sign) is a glyph added to a letter, or basic glyph. The term derives from the Greek διακριτικός (diakritikós, "distinguishing"). Diacritic is both an adjective and a noun, whereas diacritical is only an adjective. Some diacritical marks, such as the acute ( ´ ) and grave ( ` ) are often called accents. Diacritical marks may appear above or below a letter, or in some other position such as within the letter or between two letters.

The main use of diacritics in the Latin alphabet is to change the sound value of the letter to which they are added. Examples from English are the diaeresis in naïve and Noël, which show that the vowel with the diaeresis mark is pronounced separately from the preceding vowel; the acute and grave accents, which indicate that a final vowel is to be pronounced, as in saké and poetic breathèd, and the cedilla under the "c" in the borrowed French word façade, which shows it is pronounced /s/ rather than /k/. In other Latin alphabets, they may distinguish between homonyms, such as French là "there" versus la "the," which are both pronounced [la]. In Gaelic type, a dot over consonants indicates lenition of the consonant in question. In other alphabetic systems, diacritics may perform other functions. Vowel pointing systems, namely the Arabic harakat ( ـَ, ـُ, ـُ, etc.) and the Hebrew niqqud ( ַ, ֶ, ִ, ֹ , ֻ, etc.) systems, indicate sounds (vowels and tones) that are not conveyed by the basic alphabet. The Indic virama ( ् etc.) and the Arabic sukūn ( ـْـ ) mark the absence of a vowel. Cantillation marks indicate prosody. Other uses include the Early Cyrillic titlo ( ◌҃ ) and the Hebrew gershayim ( ״ ), which, respectively, mark abbreviations or acronyms, and Greek diacritics, which showed that letters of the alphabet were being used as numerals.

In orthography and collation, a letter modified by a diacritic may be treated either as a new, distinct letter or as a letter–diacritic combination. This varies from language to language, and may vary from case to case within a language.

In some cases, letters are used as "in-line diacritics" in place of ancillary glyphs, because they modify the sound of the letter preceding them, as in the case of the "h" in English "sh" and "th".

More information

the Merriam-Webster entry
the Wikipedia entry

1105 questions

votes

4 answers

Python and character normalization

Hello I retrieve text based utf8 data from a foreign source which contains special chars such as u"ıöüç" while I want to normalize them to English such as "ıöüç" -> "iouc" . What would be the best way to achieve this ?

asked Nov 12 '10 at 07:52

Hellnar

62,315
79
204
279

votes

4 answers

Remove Arabic Diacritic

I want php to convert this... Text : الْحَمْدُ لِلَّهِ رَبِّ الْعَالَمِينَ converted to : الحمد لله رب العالمين I am not sure where to start and how to do it. Absolutely no idea. I have done some research, found this link…

php arabic diacritics

asked Aug 29 '14 at 06:46

Syed Sajid

1,380
5
20
34

votes

5 answers

How can Z͎̠͗ͣḁ̵͙̑l͖͙̫̲̉̃ͦ̾͊ͬ̀g͔̤̞͓̐̓̒̽o͓̳͇̔ͥ text be prevented?

I've read about how Zalgo text works, and I'm looking to learn how a chat or forum software could prevent that kind of annoyance. More precisely, what is the complete set of Unicode combining characters that needs to: a) either be stripped, assuming…

javascript unicode diacritics combining-marks zalgo

asked Mar 09 '14 at 00:47

Dan Dascalescu

143,271
52
317
404

votes

7 answers

Regex accent insensitive?

I need a Regex in a C# program. I've to capture a name of a file with a specific structure. I used the \w char class, but the problem is that this class doesn't match any accented char. Then how to do this? I just don't want to put the most used…

c# regex diacritics non-ascii-characters

asked Jul 12 '11 at 13:03

J4N

19,480
39
187
340

votes

3 answers

ModuleNotFoundError: No module named 'unidecode' yet I have the module installed

I am trying to remove accents from a Python list of strings by converting it from UTF-8 to ASCII. I have read answers to multiple questions here in StackOverflow that suggest using the unidecode function from the unidecode package. I have installed…

python python-3.x package diacritics

asked May 10 '19 at 19:32

Felipe Ito

votes

6 answers

Regex to remove non-letter characters but keep accented letters

I have strings in Spanish and other languages that may contain generic special characters like (),*, etc. That I need to remove. But the problem is that it also may contain special language characters like ñ, á, ó, í etc and they need to remain. So…

javascript regex string diacritics

asked Dec 01 '11 at 11:36

devjs11

1,898
7
43
73

votes

2 answers

python : working with german umlaut

months = ["Januar", "Februar", "März", "April", "Mai", "Juni", "Juli", "August", "September", "Oktober", "November", "Dezember"] print months[2].decode("utf-8") Printing month[2] fails with UnicodeDecodeError: 'utf8' codec can't decode bytes in…

python unicode diacritics

asked Aug 31 '11 at 07:52

deimus

9,565
12
63
107

votes

6 answers

How to handle diacritics (accents) when rewriting 'pretty URLs'

I rewrite URLs to include the title of user generated travelblogs. I do this for both readability of URLs and SEO purposes. http://www.example.com/gallery/280-Gorges_du_Todra/ The first integer is the id, the rest is for us humans (but is…

php url-rewriting diacritics

asked Jan 21 '09 at 16:34

Jacco

23,534
17
88
105

votes

5 answers

Why doesn't Đ get flattened to D when Removing Accents/Diacritics

I'm using this method to remove accents from my strings: static string RemoveAccents(string input) { string normalized = input.Normalize(NormalizationForm.FormKD); StringBuilder builder = new StringBuilder(); foreach (char c in…

c# .net string diacritics

asked Mar 02 '10 at 11:43

Mladen Prajdic

15,457
2
43
51

votes

5 answers

normalizing accented characters in MySQL queries

I'd like to be able to do queries that normalize accented characters, so that for example: é, è, and ê are all treated as 'e', in queries using '=' and 'like'. I have a row with username field set to 'rené', and I'd like to be able to match on it…

sql mysql utf-8 diacritics collate

asked Feb 20 '10 at 16:24

George Armhold

30,824
50
153
232

votes

9 answers

ToAscii/ToUnicode in a keyboard hook destroys dead keys

It seems that if you call ToAscii() or ToUnicode() while in a global WH_KEYBOARD_LL hook, and a dead-key is pressed, it will be 'destroyed'. For example, say you've configured your input language in Windows as Spanish, and you want to type an…

windows unicode diacritics keyboard-hook

asked Dec 26 '09 at 23:24

00010000

votes

4 answers

Mongodb match accented characters as underlying character

In MongoDB "db.foo.find()" syntax, how can I tell it to match all letters and their accented versions? For example, if I have a list of names in my database: João François Jesús How would I allow a search for the strings "Joao", "Francois", or…

regex mongodb internationalization diacritics

asked Oct 10 '11 at 01:09

Josh

4,412
7
38
41

votes

3 answers

Character encoding for French Accents

I'm developing my first website for a French client and I'm having massive issues with accents being displayed as "?".After googling it for days, I thought I understood, but issues persists. To simplify it, I'll explain just the email headers (the…

php email diacritics

asked Apr 16 '11 at 22:21

denislexic

10,786
23
84
128

votes

3 answers

Should all accented characters use html entities?

I am working with a large number of HTML files that are mostly encoded as utf-8. There are accented characters galore as many are in French. I have been converting them to HTML entities as I go, but I noticed that even in IE5.5 (according IE tester)…

html character-encoding html-entities diacritics

asked Mar 06 '12 at 15:48

Damon

10,493
16
86
144

votes

1 answer

What's the correct algorithm to determine number of user-perceived-characters?

I have the task of counting the number of perceived characters in an input. The input is a group of ints (we can think of it as an int[]) which represents Unicode code points. java.text.BreakIterator.getCharacterInstance() is not allowed. (I mean…

java language-agnostic text unicode diacritics

asked Feb 01 '12 at 14:33

Pacerier

86,231
106
366
634

Prev 1 2

…

73 74 Next