0

I have a string contain some special char like "\u2012" i.e. FIGURE DASH. When i am trying to print this on console I am getting a '?' mark instead of its symbol. I have an editor where in I can insert the symbol using alt+numpad like alt+2012. In editor it I could see the symbol save it in a xml file and get the value using nodevalue, I get a '?' mark.
To summerize I am facing problem to read extended latin a charset. What i need is When i insert such symbols and read it, i should get something like &#xXXXX;. Please help!

TIA :)

Simply I have a String inpath = "À";, I want to get its unicode value..like &#xXXXX;

Vix
  • 63
  • 7
  • 1
    `\uXXXX`, `XXXX` and single characters entered via `alt+2012` are all different things. If you enter a single character and it turns into `?`, it means somewhere along the chain of saving, reading and outputting the file, the encoding of it was not handled correctly. It's impossible to say any more about this with the information you gave. – deceze Jan 18 '13 at 01:22
  • java on windows platform. – Vix Jan 18 '13 at 01:34
  • 1
    `XXXX` is not the "Unicode value" of "À". `XXXX` is the HTML entity encoding for that character, which is only relevant in an HTML context. It is pointless if you want to display the character "À" in a Java app/console/other non-HTML context. – deceze Jan 18 '13 at 01:37
  • I agree with your point, but is there a way to decode `À` to `À`. http://www.fileformat.info/info/unicode/char/00c0/index.htm – Vix Jan 18 '13 at 01:50
  • To get the Unicode value of a 1-character string, use `"x".codePointAt(0)`. – Mechanical snail Jan 18 '13 at 01:53
  • What if I have this char in `This À is value of my test`? i cannot go by each character also I can have any other such special chars – Vix Jan 18 '13 at 02:00
  • There sure are ways to encode a character to its HTML entity. My point though is that you should clarify what you're asking for and what the problem is. Your current question/statement is conflating many different things. If you have a specific question about how to do something specific in a specific language, please put that all into the question. *Tag* the question with the appropriate language so the right people can find it. – deceze Jan 18 '13 at 02:55

1 Answers1

0

The default console encoding in Windows is some MS-DOS code page and they don't support the character. You can try running chcp 65001 before running the program but you might also need to change the console font as well.

You don't need to do anything you wouldn't do with any other character, as long as you use UTF-8. You aren't doing that in many places. You need to explicitly write in your code to save and read the file in UTF-8, and not rely on the platform default encoding.

Esailija
  • 138,174
  • 23
  • 272
  • 326