6

I have a URI string like the following:

http://www.christlichepartei%F6sterreichs.at/steiermark/

I'm creating a java.lang.URI instance with this string and it succeeds but when I want to retrieve the host it returns null. Opera and Firefox also choke on this URL if I enter it exactly as shown above. But shouldn't the URI class throw a URISyntaxException if it is invalid? How can I detect that the URI is illegal then?

It also behaves the same when I decode the string using URLDecoder which yields

http://www.christlicheparteiösterreichs.at/steiermark/

Now this is accepted by Opera and Firefox but java.net.URI still doesn't like it. How can I deal with such a URL?

thanks

Raoul Duke
  • 4,241
  • 2
  • 23
  • 18

3 Answers3

4

Java 6 has IDN class to work with internationalized domain names. So, the following produces URI with encoded hostname:

URI u = new URI("http://" + IDN.toASCII("www.christlicheparteiösterreichs.at") + "/steiermark/");
Janus Troelsen
  • 20,267
  • 14
  • 135
  • 196
axtavt
  • 239,438
  • 41
  • 511
  • 482
  • 2
    `IDN#toASCII` is intended only to work on labels or full domain names, not entire URI strings. This may cause unintended consequences. – NickAldwin Jul 17 '14 at 20:59
2

The correct way to encode non-ASCII characters in hostnames is known as "Punycode".

MSalters
  • 173,980
  • 10
  • 155
  • 350
2

URI throws an URISyntaxException, when you choose the appropriate constructor:

URI someUri=new URI("http","www.christlicheparteiösterreichs.at","/steiermark",null);

java.net.URISyntaxException: Illegal character in hostname at index 28: http://www.christlicheparteiösterreichs.at/steiermark

You can use IDN for this to fix:

URI someUri=new URI("http",IDN.toASCII("www.christlicheparteiösterreichs.at"),"/steiermark",null);
System.out.println(someUri);
System.out.println("host: "+someUri.getHost()));

Output:

http://www.xn--christlicheparteisterreichs-5yc.at/steiermark

host: www.xn--christlicheparteisterreichs-5yc.at

UPDATE regarding the chicken-egg-problem:

You can let URL do the job:

public static URI createSafeURI(final URL someURL) throws URISyntaxException
{
return new URI(someURL.getProtocol(),someURL.getUserInfo(),IDN.toASCII(someURL.getHost()),someURL.getPort(),someURL.getPath(),someURL.getQuery(),someURL.getRef());    
}


URI raoul=createSafeURI(new URL("http://www.christlicheparteiösterreichs.at/steiermark/readme.html#important"));

This is just a quick-shot, it is not checked all issues concerning converting an URL to an URI. Use it as a starting point.

Michael Konietzka
  • 5,419
  • 2
  • 28
  • 29
  • Hi. Thanks for your answer but how does the URI constructor help me when I don't have the individual parts of the URL. It's a bit of a chicken and egg problem :) – Raoul Duke Sep 28 '10 at 08:02
  • You are right. It depends from where do you get your data. If you get an String like "http://www.christlicheparteiösterreichs.at/steiermark/" as input, you just cannot use it in new URI(String), because the JavaDoc states, it wants an already correct URI-String. But this string is not. You have to check where in the dataflow the String gets "corrupted". Where does this string come from? – Michael Konietzka Sep 28 '10 at 08:44
  • Hi, thanks for taking the time to look into this. THe suggestion in your update looks promising, I probably can work with that. Thanks again! – Raoul Duke Sep 30 '10 at 09:25