This is by design. From what I can tell, the 2nd character in your string is a \u0149 codepoint. According to the latest Unicode code charts:
this character is deprecated and its use is
strongly discouraged
The Unicode code chart says that the deprecated code point is equivalent to \u02bc
followed by \u006e
.
The according to the javadocs, first step that IDN.toASCII(String)
does is to use the RFC 3491 stringprep / nameprep algorithm to process the characters in the input string. The RFC abstract says:
This document describes how to prepare internationalized domain name
(IDN) labels in order to increase the likelihood that name input and
name comparison work in ways that make sense for typical users
throughout the world. This profile of the stringprep protocol is
used as part of a suite of on-the-wire protocols for
internationalizing the Domain Name System (DNS).
(In other words, stringprep is designed to make it harder to create tricky domain names that look like one thing and mean something different.)
In fact, if you drill down, you will find that the prescribed mapping in stringprep tables for \u0149
is \u02bc
\u006e
; i.e. the equivalent defined in the Unicode code charts.
And ... that is what is happening.
Summary
- Your expectation that you can round-trip IDNs is ill-founded.
- You shouldn't be using that character anyway, since it is deprecated. (Certainly, it is a bad idea to use it in an IDN!)