Questions tagged [transliteration]

Transliteration refers to the process of mapping letters or glyphs from one character encoding to another

Transliteration is the conversion of letters from one alphabet to another one, like from Greek to Latin. But it may as well be just a simplification within one alphabet, for example omitting any diacritics found in that alphabet or substituting special characters with a sequence of characters without diacritics.

257 questions
4
votes
1 answer

ICU custom transliteration

I am looking to utilize the ICU library for transliteration, but I would like to provide a custom transliteration file for a set of specific custom transliterations, to be incorporated into the ICU core at compile time for use in binary form…
NatHillard
  • 306
  • 2
  • 10
4
votes
1 answer

What's the limit of google transliteration?

I've used google transliteration API experimentally. It's working fine and I've noticed that it allows only five words at a time. Is there any method to send more words? and is there any daily limit? If I have 100 words, I will have to send a set of…
wasimbhalli
  • 5,122
  • 8
  • 45
  • 63
4
votes
1 answer

Converting accents to ASCII in R

I'm trying to convert special characters to ASCII in R. I tried using Hadley's advice in this question: stringi::stri_trans_general('Jos\xe9', 'latin-ascii') But I get "Jos�". I'm using stringi v1.1.1. I'm running a Mac. My friends who are running…
Huey
  • 2,714
  • 6
  • 28
  • 34
4
votes
0 answers

Language detection for pinyin, translit etc?

Real-world user-generated text in non-Latin alphabet languages is often not in canonical form but in translit, shlyokavitsa, arabizi, pinyin and so on. Language detection software is starting to handle it smartly, but usually it doesn't work, even…
Adam Bittlingmayer
  • 1,169
  • 9
  • 22
4
votes
1 answer

Google Transliteration - word suggestion while typing

I am using google transliteration in my project. I tried the code snippet provided on Google Transliterate API Developer's Guide. However, less documentation is available and I have very less idea about this. The code works in a way that it…
Shri
  • 834
  • 1
  • 12
  • 36
4
votes
0 answers

Proper Name Transliteration API

I'm looking for a transliteration API. I checked Google Translate API and Microsoft Translator API, but neither can handle name transliteration to English (phonetic spelling of the name in English). The only relevant API I found so far was Google…
ron
  • 625
  • 2
  • 6
  • 17
4
votes
2 answers

Calling iconv via PHP produces different results in Apache and Command Line

I am trying to use iconv to remove accents from names using this function $name = iconv('UTF-8', 'ASCII//TRANSLIT', $original) So if $original was 'šñć' I would expect 'snc'. Running this via a PHP script in the command line does produce the…
bScutt
  • 872
  • 8
  • 23
4
votes
3 answers

PHP transliterate specify locale

I am using PHP Transliterator (from php5-intl, using ICU) to transliterate CJK to Latin (Romanization), problem is, I need some ways to specify the input locale so that Japanese Kanji are not romanized into Chinese Pinyin (as they often share the…
bitinn
  • 9,188
  • 10
  • 38
  • 64
4
votes
0 answers

How do I get ICU to transliterate from any Unicode to Latin1 (ISO-8859-1) in C++

I can get ICU to transliterate to Latin using "Any-Latin" but this still includes characters, e.g. macrons, that are not in the Latin1 codepage. I can get it to transliterate to ASCII using "Any-Latin; Latin-ASCII" but then I lose all the accented…
Unripe
  • 51
  • 5
4
votes
0 answers

Transliteration between different writing systems

I need to learn how to change a transliteration of a text to another writing system. Apparently the best way would somehow involve regular expressions and perl, probably from command line? I've been using regular expressions earlier in Notepad++ and…
nikopartanen
  • 577
  • 8
  • 15
4
votes
2 answers

Convert optically equivalent unicode strings to ASCII in Java?

I run a social network that requires unicode usernames to be unique (as expected). Some creative users have started using Cyrillic (and other) unicode characters to create optically equivalent (but unicode distinct) usernames. For example, they'll…
OnesAndZeroes
  • 315
  • 1
  • 9
4
votes
2 answers

Where can I find a list of IDs or rules for the PHP transliterator (Intl)?

Transliterator::listIDs() will list IDs, but apparently it's not a complete list. In the example from this page, the ID looks like: Any-Latin; NFD; [:Nonspacing Mark:] Remove; NFC; [:Punctuation:] Remove; Lower(); which is kind of weird, because…
nice ass
  • 16,471
  • 7
  • 50
  • 89
4
votes
1 answer

Converting Unicode characters into the equivalent ASCII ones

I need to "flatten out" a number of Unicode strings for the purposes of indexing and searching. For example, I need to convert GötheФ€ into ASCII. The last two characters have no close representations in ASCII so it's Ok to discard them completely.…
Desmond Hume
  • 8,037
  • 14
  • 65
  • 112
3
votes
1 answer

Usage of iconv function in R to transliterate German words

I am trying to use the iconv function in R to achieve the correct transliteration of German words (for example, Möbel → Moebel). I have written the following code (tried with English/German locales): iconv("Möbel", "latin1", "ASCII//TRANSLIT") [1]…
manro
  • 3,529
  • 2
  • 9
  • 22
3
votes
5 answers

Exclude specific characters from Transliterator conversion

I'm trying to make a transliteration using PHP, but what I need is the conversion of all non-latin characters but keep the italian accented characters (àèìòù). PHP Transliterator lacks of documentation and on-line examples. I've read the ICU docs…
Ma3x
  • 516
  • 6
  • 19