3

We were asked to make a program that will read a text file and show a summary of Unicode characters. While doing this I encountered a problem with some Unicode characters that appear to be printed as a question mark in my console. However when I output the same Unicode text using Swing, its not a question mark anymore

    System.out.println("\u0126"); // appears to be ? in my console.

    JOptionPane.showMessageDialog(null,"\u0126"); // seems to display the character successfuly

I could just leave the problem behind since I'm gonna make use of GUI, but I want an explanation, something that beginners like me could understand.

Why do some Unicode characters appear to be a question mark printed in the console but does not in Swing, printing correctly? (Eclipse, NetBeans, JCreator, JGrasp do the same, I thought its a problem with my IDE).

Is it a problem in Encoding or Font? And what should I do in order to successfully display Unicode text in the console without any trouble of question marks in the future?

Richard Chambers
  • 16,643
  • 4
  • 81
  • 106
misserandety
  • 122
  • 1
  • 4
  • 13
  • 1
    What terminal? What OS? The default Windows terminal (Cmd.exe) can only display the 256 characters in [code page 437](https://en.wikipedia.org/wiki/Code_page_437). – Adam Rosenfield Sep 17 '13 at 17:23
  • I'm sorry if I'm not clear in my question, I was referring to the IDE console output. my bad. I'm using windows 8 – misserandety Sep 17 '13 at 17:31
  • Strange, it works fine for me. What encoding are you using in your project? Also what is encoding of your input file? – Pshemo Sep 17 '13 at 17:35
  • @Pshemo I figured out that I'm using cp1252, I changed it to utf-8 and it worked, thanks to you – misserandety Sep 17 '13 at 17:46
  • @misserandety I suspected something like that. Glad you solved it :) – Pshemo Sep 17 '13 at 17:47

3 Answers3

4

The characters generated by UNICODE will depend on the character gylphs that are part of the font that you are using. Most fonts have only a subset of the complete UNICODE standard. For instance if you are wanting to display Simplified Chinese the font you are using must have the glyphs for Simplified Chinese.

The UNICODE Consortium has some information about this.

Richard Chambers
  • 16,643
  • 4
  • 81
  • 106
  • No font contains all Unicode characters (simply because Unicode has more characters than a single font can contain, due to restrictions of current font technologies). – Jukka K. Korpela Sep 17 '13 at 17:46
  • @JukkaK.Korpela Of course these days systems use font fallback in order to ensure any Unicode character can be displayed regardless of the font that's selected (so long as some font installed font has that character, and if nothing else there's the [Last Resort Font](http://www.unicode.org/policies/lastresortfont_eula.html)). So even 'plain text' isn't limited to the 65536 glyphs of a single font. – bames53 Sep 17 '13 at 18:32
  • @bames53, it’s not “of course” at all, and it’s not about systems but about rendering software. – Jukka K. Korpela Sep 17 '13 at 19:47
  • Font fallback is built in to GTK, Qt, Java, Cocoa, etc. So on such platforms you can build the simplest possible app that has a text field and they will be able to display each character using the appropriate font. That's what I mean by 'system'; modern UI frameworks do this by default. It works even in terminal emulators like gtkTerm or Terminal.app. Presumably the 'console' the OP is using is cmd.exe, which unfortunately is not build using a modern framework. – bames53 Sep 17 '13 at 21:26
1

How to insert emojis and fancy Unicode Characters in Java?

It took me a while to find a solution by my own, hope it helps :)

  // To run emojis in Java make sure you have installed Windows Terminal
  // and in order to insert them, remember Java char only support 16-bit 
  // characters meanwhile a great part of Unicode is 32-bit sooooo
  // every single Unicode character must be break into two 16-bit 
  // character, these are called surrogates.


  // How to create surrogates and use emojis in Java! :D
  //1. Copy Uni code you want to use (Ommit U+, that's only a label)
  //   https://unicode-table.com/en/1F680/
  //2. Paste in here and copy both surrogates
  //   http://www.russellcottrell.com/greek/utilities/SurrogatePairCalculator.htm
  //3. Code it as Strings and separate it by \u symbol in Java.
  //4. Compile it and run it in Windows Terminal.
  //   #HappyCoding!!


  // One small thing: For some reason you need to put two spaces after 
  // every two surrogates, otherwise it'll print two question marks next
  // to it.

  // Other small thing: Try to use emojis at the end of a line because for 
  // some reason if you insert String after the Unicode it'll do strange 
  // things.


  String rocket = "Hello Entrepeneurs \uD83D\uDE80  ";
  System.out.print(rocket);
0

u0126 -> is a character outside of the ANSCII (or whatever it is called) that most terminals output. The character is actually Ħ, but this will only show if your console can show that character.

You can see if installing utf-8 might allow you to show characters like that.

Arash Saidi
  • 2,228
  • 20
  • 36