0

I have a text - "¸ÁévÀAvÀæöå ºÉÆÃgÁlzÀ°è «zÁåyðUÀ¼ÀÆ ¥Á¯ÉÆÎArzÀÝgÀÄ.", which I copied from a web page which had UTF-8 encoding.

There is this website ASCII2UNICODE in which I copy-paste the text above and after converting I get - "ಸ್ವಾತಂತ್ರ್ಯ ಹೋರಾಟದಲ್ಲಿ ವಿದ್ಯಾರ್ಥಿಗಳೂ ಪಾಲ್ಗೊಂಡಿದ್ದರು." which is perfectly correct.

How can I do this in Java?

I have tried :

System.out.println(new String("¸ÁévÀAvÀæöå ºÉÆÃgÁlzÀ°è «zÁåyðUÀ¼ÀÆ ¥Á¯ÉÆÎArzÀÝgÀÄ.".getBytes("ASCII"), "UTF-8"));
System.out.println(new String("¸ÁévÀAvÀæöå ºÉÆÃgÁlzÀ°è «zÁåyðUÀ¼ÀÆ ¥Á¯ÉÆÎArzÀÝgÀÄ.".getBytes("UTF-8"), "ASCII"));

Now I am really not sure whether the source text is ASCII or UTF-8.

dertkw
  • 7,798
  • 5
  • 37
  • 45
Xavier DSouza
  • 2,861
  • 7
  • 29
  • 40
  • 1
    This is not just the encoding conversion. You can see a source code of a project (https://github.com/aravindavk/ascii2unicode/blob/master/web/a2u.js) – agad May 30 '14 at 07:39
  • Ok..Is there any Java library that does the above conversion – Xavier DSouza May 30 '14 at 07:46
  • It cannot be ASCII; most of those characters don't even exist in ASCII. The website you point to is also mis-named. The source text might be UTF-8 interpreted as some 8-bit codepage like ISO-8859-1. So take the string (which will be represented as Unicode in Java), encode it in ISO-8859-1, take those bytes, and decode them as UTF-8. – Thomas May 30 '14 at 08:15
  • You can take a look to http://www.ibm.com/developerworks/data/library/techarticle/dm-1212transliteration/index.html?ca=drs. Following link can be also useful http://en.wikipedia.org/wiki/Devanagari_transliteration – agad May 30 '14 at 11:42

0 Answers0