5

I have done an exhaustive search of stackoverflow and Google, but I have so far been unable to find others having a similar problem.

In a sample Java Swing test program, I create a plain JTextField so that I can try to paste characters into it from a webpage (http://isthisthingon.org/unicode/). When I test with '㓿' (code point 13567) it is able to paste the character. This character is the last listed character in the CJK Ideograph Extension A plane. However, when I move to the next related plane, CJK Ideograph Extension B, trying to copy and paste the character '' (code point 131072) fails. It does not render a box or any sort of glyph, it appears as if I had nothing in the system clipboard at all.

I realize that CJK Ideograph Extension B is a set of characters that are considered "supplemental" and need two 16bit blocks instead of one when Java encodes them internally as UTF-16. Further testing proves that I am able to display the supplemental characters if I hard-code the text into a display area.

This was tested using Windows 7 and Java 6.

I understand that as of Java 5, support for the supplemental unicode characters was added, however, I am wondering why (or if) the cut and paste functionality in swing still does not allow me to paste these characters. Is there something additional I need to do to tell Java to handle these characters when using the JTextField or JTextArea classes? Is there a way yet for Java's Swing libraries to be able to paste these characters into a text field yet?

Thank you for your time!

Locriansax
  • 133
  • 8
  • 1
    No sooner did I post this, than I may have found my answer. This has been a long standing bug in the JDK - http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6877495. – Locriansax Aug 11 '11 at 15:53
  • Unicode has had more characters than fit in a 16-bit integer for more most of its lifetime! I can’t believe that Java is still screwed up with this. But yesterday I found yet another UCS-2 bug in the Java String class, one that’s been there forever. This is ridiculous. The whole UTF-16 thing is a horrible curse, and Java will never be free of the countless bugs it causes. They are simply everywhere and it is maddening. People just can’t get things right. – tchrist Aug 12 '11 at 01:18
  • Thanks Alexey! just created an answer. :) – Locriansax Aug 12 '11 at 15:40
  • @tchrist - what was the bug that you found in the String class? If it was submitted as an official bug could you post the link too? I've been doing a lot of work with i18n stuff here at work and the more I know about Java's quirks with respect to the supplementary character set, the better! – Locriansax Aug 12 '11 at 15:43
  • 1
    @Locriansax: No, I didn’t bug report it, I mailed it to i18n-dev openjdk list that I’m on. You can find [that mail right here](http://www.mail-archive.com/i18n-dev@openjdk.java.net/msg00398.html). The problem is that the code processes things by partial code points, not full ones, so gets wrong answers. It snuck by till Unicode 3.1 showed up in March 2001, because that introduced the Deseret script, which is a case-changing script up it the astral planes. It’s been broken >10 years. I hold all char-based Java code so super highly suspect that it’s guilty till proven innocent. Safe assumption. – tchrist Aug 12 '11 at 18:49
  • Fabulous - thanks for the link! – Locriansax Aug 12 '11 at 19:23

1 Answers1

2

No sooner did I post this, than I may have found my answer. This has been a long standing bug in the JDK.

agf
  • 171,228
  • 44
  • 289
  • 238
Locriansax
  • 133
  • 8