Questions tagged [diacritics]

A Diacritic is "a mark near or through an orthographic or phonetic character or combination of characters indicating a phonetic value different from that given the unmarked or otherwise marked element" -- Merriam-Webster

From Wikipedia:

A diacritic (/daɪ.əˈkrɪtɨk/; also diacritical mark, diacritical point, diacritical sign) is a glyph added to a letter, or basic glyph. The term derives from the Greek διακριτικός (diakritikós, "distinguishing"). Diacritic is both an adjective and a noun, whereas diacritical is only an adjective. Some diacritical marks, such as the acute ( ´ ) and grave ( ` ) are often called accents. Diacritical marks may appear above or below a letter, or in some other position such as within the letter or between two letters.

The main use of diacritics in the Latin alphabet is to change the sound value of the letter to which they are added. Examples from English are the diaeresis in naïve and Noël, which show that the vowel with the diaeresis mark is pronounced separately from the preceding vowel; the acute and grave accents, which indicate that a final vowel is to be pronounced, as in saké and poetic breathèd, and the cedilla under the "c" in the borrowed French word façade, which shows it is pronounced /s/ rather than /k/. In other Latin alphabets, they may distinguish between homonyms, such as French là "there" versus la "the," which are both pronounced [la]. In Gaelic type, a dot over consonants indicates lenition of the consonant in question. In other alphabetic systems, diacritics may perform other functions. Vowel pointing systems, namely the Arabic harakat ( ـَ, ـُ, ـُ, etc.) and the Hebrew niqqud ( ַ, ֶ, ִ, ֹ , ֻ, etc.) systems, indicate sounds (vowels and tones) that are not conveyed by the basic alphabet. The Indic virama ( ् etc.) and the Arabic sukūn ( ـْـ ) mark the absence of a vowel. Cantillation marks indicate prosody. Other uses include the Early Cyrillic titlo ( ◌҃ ) and the Hebrew gershayim ( ״ ), which, respectively, mark abbreviations or acronyms, and Greek diacritics, which showed that letters of the alphabet were being used as numerals.

In orthography and collation, a letter modified by a diacritic may be treated either as a new, distinct letter or as a letter–diacritic combination. This varies from language to language, and may vary from case to case within a language.

In some cases, letters are used as "in-line diacritics" in place of ancillary glyphs, because they modify the sound of the letter preceding them, as in the case of the "h" in English "sh" and "th".

More information

1105 questions
38
votes
6 answers

How to remove accents from values in columns?

How do I change the special characters to the usual alphabet letters? This is my dataframe: In [56]: cities Out[56]: Table Code Country Year City Value 240 Åland Islands 2014.0 MARIEHAMN 11437.0 1 240 …
Marius
  • 397
  • 1
  • 3
  • 5
37
votes
3 answers

Java string searching ignoring accents

I am trying to write a filter function for my application that will take an input string and filter out all objects that don't match the given input in some way. The easiest way to do this would be to use String's contains method, i.e. just check…
DaveJohnston
  • 10,031
  • 10
  • 54
  • 83
37
votes
8 answers

Convert accented characters to their plain ascii equivalents

I have to convert french characters into english on my php. I've used the following code: iconv("utf-8", "ascii//TRANSLIT", $string); But the result for ËËË was "E"E"E. I don't need that double quote and other extra characters - I want to show an…
ram
  • 593
  • 1
  • 8
  • 18
35
votes
4 answers

How to generate javadoc documentation with umlauts?

I am trying to generate Java documentation in Eclipse. The source files are UTF-8 encoded and contain some umlauts. The resulting HTML files do not specify an encoding and do not use HTML entities, so the umlauts aren't displayed correctly in any…
Kim Stebel
  • 41,826
  • 12
  • 125
  • 142
34
votes
12 answers

Test if string contains only letters (a-z + é ü ö ê å ø etc..)

I want to match a string to make sure it contains only letters. I've got this and it works just fine: var onlyLetters = /^[a-zA-Z]*$/.test(myString); BUT Since I speak another language too, I need to allow all letters, not just A-Z. Also for…
patad
  • 9,364
  • 11
  • 38
  • 44
32
votes
2 answers

Get CSV Data from Clipboard (pasted from Excel) that contains accented characters

SCENARIO My users will copy cells from Excel (thus placing it into the clipboard) And my application will retrieve those cells from the clipboard THE PROBLEM My code retrieves the CSV format from the clipboard However, the if the original Excel…
namenlos
  • 5,111
  • 10
  • 38
  • 38
31
votes
11 answers

Replacing diacritics in Javascript

How can I replace diacritics (ă,ş,ţ etc) with their "normal" form (a,s,t) in javascript?
Paul Grigoruta
  • 2,386
  • 1
  • 20
  • 25
30
votes
3 answers

The encoding that Notepad++ just calls "ANSI", does anyone know what to call it for Ruby?

I have a bunch of .txt's that Notepad++ says (in its drop-down "Encoding" menu) are "ANSI". They have German characters in them, [äöüß], which display fine in Notepad++. But they don't show up right in irb when I File.read 'this is a German text…
Owen_AR
  • 2,867
  • 5
  • 20
  • 23
29
votes
9 answers

How to remove accents in MySQL?

I've just compiled a database of 1 million place names. I'm going to use it in an auto-complete widget to look up cities. A lot of these places have accents... I want to be able to find records when a user types the name without an accent. In order…
mpen
  • 272,448
  • 266
  • 850
  • 1,236
28
votes
11 answers

Remove diacritics from a string

Is it possible? This is my input string: ľ š č ť ž ý á í é Č Á Ž Ý This is the output I want: l s c t z y a i e C A Z Y
Richard Knop
  • 81,041
  • 149
  • 392
  • 552
27
votes
3 answers

Accent insensitive search query in MySQL

Is there any way to make search query accent insensitive? the column's and table's collation are utf8_polish_ci and I don't want to change them. example word : toruń select * from pages where title like '%torun%' It doesn't find "toruń". How can I…
Okan Kocyigit
  • 13,203
  • 18
  • 70
  • 129
27
votes
9 answers

Save Accents in MySQL Database

I'm trying to save French accents in my database, but they aren't saved like they should in the DB.For example, a "é" is saved as "é".I've tried to set my files to "Unicode (utf-8)", the fields in the DB are "utf8_general_ci" as well as the DB…
Ebpo
  • 794
  • 3
  • 12
  • 22
27
votes
5 answers

How to replace unicode characters by ascii characters in Python (perl script given)?

I am trying to learn python and couldn't figure out how to translate the following perl script to python: #!/usr/bin/perl -w use open qw(:std :utf8); while(<>) { s/\x{00E4}/ae/; s/\x{00F6}/oe/; s/\x{00FC}/ue/; …
Frank
  • 64,140
  • 93
  • 237
  • 324
25
votes
4 answers

How to ignore accent in SQLite query (Android)

I am new in Android and I'm working on a query in SQLite. My problem is that when I use accent in strings e.g. ÁÁÁ ááá ÀÀÀ ààà aaa AAA If I do: SELECT * FROM TB_MOVIE WHERE MOVIE_NAME LIKE '%a%' ORDER BY MOVIE_NAME; It's return: AAA aaa (It's…
andrehsouza
  • 479
  • 1
  • 5
  • 14
23
votes
6 answers

How to ignore acute accent in a javascript regex match?

I need to match a word like 'César' for a regex like this /^cesar/i. Is there an option like /i to configure the regex so it ignores the acute accents?. Or the only solution is to use a regex like this /^césar/i.
1
2
3
73 74