0

Is there any solution from English unicode to Gujarati unicode? Suppose unicode for a = \u0061 then it will translate અ = u0095

Umar Mansuri
  • 127
  • 1
  • 14
  • 2
    Transliteration means replacing letters by some letters of another alphabet, so in the programming sense, it’s very simple string manipulation (read a string character by character, output characters in a table-driven manner). Depending on the transliteration scheme, there can be complications. But which scheme would you apply, and what do you see as problematic then? – Jukka K. Korpela Sep 13 '12 at 07:08
  • 1
    The character code U+0095 is wrong. Apparently the one you want is U+0A85. How other characters shoulc be translitterated is not obvious; could you link to a spec, an implementation, or some additional examples? http://www.fileformat.info/info/unicode/char/a85/index.htm – tripleee Sep 13 '12 at 13:38

1 Answers1

0

The Unicode CLDR provides files containing instructions on how to transliterate from Latin to Gujarati. The instructions for transformation are in .XML files using Locale Data Markup Language.

Latin-Gujarati involves:

  1. Filtering the string to characters or character ranges described by

    ['.0-9A-Za-z~À-ÅÇ-ÏÑ-ÖÙ-Ýà-åç-ïñ-öù-ýÿ-ďĒ-ĥĨ-İĴ-ķĹ-ľŃ-ňŌ-őŔ-ťŨ-žƠ-ơƯ-ưǍ-ǜǞ-ǣǦ-ǭǰǴ-ǵǸ-țȞ-ȟȦ-ȳʔ́̃-̄̆-̇̐̔-̣̥̱́̈́̕΅-ΆΈ-ΊΌΎ-ΐά-ΰό-ώϓЃЌЎЙйѓќўӁ-ӂӐ-ӑӖ-ӗӢ-ӣӮ-ӯḀ-ẙẠ-ỹἁἃ-ἅἇἉἋ-ἍἏἑἓ-ἕἙἛ-Ἕἡἣ-ἥἧἩἫ-ἭἯἱἳ-ἵἷἹἻ-ἽἿὁὃ-ὅὉὋ-Ὅὑὓ-ὕὗὙὛὝὟὡὣ-ὥὧὩὫ-ὭὯάέήίόύώᾁᾃ-ᾅᾇᾉᾋ-ᾍᾏᾑᾓ-ᾕᾗᾙᾛ-ᾝᾟᾡᾣ-ᾥᾧᾩᾫ-ᾭᾯ-ᾱᾴᾸ-ᾹΆῄΈΉ῎ῐ-ῑΐῘ-ῙΊ῞ῠ-ῡΰῥῨ-ῩΎ-Ῥ΅ῴΌΏK-Å\uE04D\uE064]

  2. Putting the result of the previous step in Normal Form D

  3. Lowercasing the result of the previous step

  4. Performing Latin-InterIndic transform on the result of the previous step. As you can see from the file, this has already gotten pretty compilcated and I am not going into the details of this step.

  5. Performing InterIndic-Gujarati on the result of previous step. Same note as previous step.

  6. Putting the result of the previous step in Normal Form C

So if we do this for the letter "a", and skip right to step 4, which describes the following relevant transforms:

$wa=\uE005
a→$wa

We have "\uE005" now. Now step 5:

\uE005→અ

So we end up with , and it is unchanged by step 6.


You probably want to look at CLDR Eclipse Setup but I'm not sure if these are just development tools for the cldr maintainers and I actually have no idea if anyone has implemented a library for this in java.

Esailija
  • 138,174
  • 23
  • 272
  • 326