Eclipse detail formatter string not displaying all Unicode characters

Question

I like to see the clipboard symbol: (U+1F4CB) in the debugger.

I understand the two codepoints.

Whearat:

\ud83d is ߓ
\u8dccb is

I like to detail-format to see it in the debug-tooltip in Unicode.

My current detail-formatter(Preferences->Java-Debug->Detail Formatter) is:

new String(this.getBytes("utf8"), java.nio.charset.Charset.forName("utf8")).concat(" <---")

(the code above does simply nothing than add a <--- to the detail-view)

Question 1:

What formatter do I need to see the character displayed correctly in the yellow tooltip?

Source

import java.nio.charset.Charset;

public class Test {
    public static void main(String[] args) {
        byte[] db = new byte[] { -16, -97, -109, -117 };
        String x = new String(db, Charset.forName("utf8"));
        System.out.println(x);
        return;
    }
}

Holger · Answer 1 · 2018-07-03T08:39:45.853

4

The “” character has been defined within the Unicode character set and since String instances are sequences of Unicode characters, they may contain that character. But it lies outside the Basic Multilingual Plane, so software processing it has to handle it with more care. Most notably, it must not try to process it as individual char values, which are UTF-16 units, requiring processing such a character as pair of surrogate characters.

Your detail formatter specified as

new String(this.getBytes("utf8"), java.nio.charset.Charset.forName("utf8")) …

doesn’t help here, as this.getBytes("utf8") converts the Unicode String instance to a byte[] array in the UTF-8 encoding, which is then passed to the new String(…, Charset.forName("utf8")) constructor, converting the byte array back to an identical String instance. If Eclipse’s debugger failed to render the original string, it won’t suddenly do it correctly with an identical string after that redundant operation.

Generally, if Eclipse’s debugger is incapable of correctly rendering strings containing characters outside the Basic Multilingual Plane, there is nothing you can do in a Detail Formatter to fix that, as all processing you will do there, will eventually end up in a String, perhaps after applying a chain of Detail Formatters. So the end result can only be one of two choices, a String with the problematic character removed or a String which Eclipse’s debugger can’t render correctly.

In other words, this is a bug that can only get fixed on Eclipse’s side.

edited Jul 03 '18 at 08:39

answered Jul 02 '18 at 15:22

Holger

285,553
42
434
765

A utf8-character is a unicode character in the utf8 encoding. Using the term utf8-character I transport in the first step the information that it is a unicode character and in the second step i transport the additional information how the unicode character is encoded: in utf8. There is no atomic thing like a utf8-character but ther *exists such a thing as a "utf8-character"*! I disaccept that "if (...) there is nothing you can do(...)" because I could write a bug report or wait on solving existing bug-reports. – Grim Jul 03 '18 at 07:20
@PeterRader I clearly said “there is nothing you can do **in a Detail Formatter**”. That doesn’t exclude bug reports. Anyway, your terminology is way off. Your String is **not** encoded in UTF-8. It is encoded in whatever the `String` implementation uses internally, usually UTF-16, which doesn’t matter, as the software interface is defined in terms of Unicode. As explained, your conversion to UTF-8 and back to a `String` is entirely obsolete, as the resulting string is identical to the original string. Which demonstrates, how nonsensical speaking of an “utf-8 character” is. – Holger Jul 03 '18 at 07:55
I absolutly know that the conversion to bytes and back to string is obsolete but it forces people to think before vote or answer or vote nonsensical answers. – Grim Jul 03 '18 at 08:10
1

I have no idea, what your question is aiming at. I patiently explained, that the end result of an Eclipse Detail Formatter is always a `String` instance, no matter what you do in-between, and the UTF-8 encoding is nowhere involved, as far as Eclipse’s Detail Formatter is involved. The intermediate conversion to UTF-8 is just distracting. I got your downvote for that, with some rants about “UTF-8 character” was a thing and a misquoting of my sentence “there is nothing you can do **in a Detail Formatter**”, so you are free to explain what actual valuable content is hiding in your question. – Holger Jul 03 '18 at 08:17
I understand that the encoding can not be set by code, instead I expect a dropdown somewhere in the Detail Formatter. I do not differenciate between Charset and Encoding in the question, that makes the question bad. You had important contributions to the question, I will update the question soon to have your improovments in the question. I edited the question. – Grim Jul 03 '18 at 08:23
1

@PeterRader for software exchanging bytes, it makes sense to consider charset and encoding together, but here, the problem is that the entire process is already defined in terms of Unicode, the Java `String`, the JVM’s debugging interface, as well as Eclipse, a Java software, are all using Unicode. *In theory*, it should already work and in practice, it works for all codepoints in the BMP, i.e. `0 - \uFFFF` range. It’s a bug that it doesn’t work for the SMP characters, i.e. above `\uFFFF`. There is no need to select an encoding, as the transport protocols are fine. It’s a bug. – Holger Jul 03 '18 at 08:31
To name a counter-example, the console view showing the application’s output, sends and receives bytes, hence, depends on the chosen encoding. – Holger Jul 03 '18 at 08:33

Karol Dowbecki · Answer 2 · 2018-07-02T07:26:35.913

Your code and the clipboard emoji work just fine in IntelliJ 2018.1. Both the debugger's variables view and the console output are working.

It's unlikely this is a problem with the code. Maybe it's the font you are using in your Eclipse that can't print out the UTF emojis? I'd imagine that Eclipse understands the concept of code points when displaying the tooltips.

The code I executed in IntelliJ:

byte[] db = new byte[] { -16, -97, -109, -117 };
String x = new String(db, Charset.forName("utf8"));
System.out.println(x);
String f = new String(x.getBytes("utf8"), Charset.forName("utf8")).concat(" <---");
System.out.println(f);

And observed following in the debugger:

Eclipse detail formatter string not displaying all Unicode characters

Question 1:

Source

2 Answers2