0

How to normalise a URL in Java to remove the fragment. I.e. from https://www.website.com#something to https://www.website.com

This is possible with the URL.Normalize code, although in this specific use case I've only got a full absolute URL which needs to remain intact.

I'd like to be able to modify this code slightly to remove the fragment from the URL;

//The website below is just an example. In reality, this URL is unknown and could be anything. Both with and without a fragment depending on the use case
URL absUrl = new URL("https://www.website.com#something");

My thoughts so far is that this is only going to be possible by breaking down the URL into the Protocol + Domain + Path then joining it all back together which does appear to work, but there must be a more elegant way of doing this.

Michael Cropper
  • 872
  • 1
  • 10
  • 28
  • 1
    You could also use ``substring`` and ``indexOf`` using the ``#`` character. – f1sh Dec 11 '16 at 20:09
  • The # may not always be present though so that would involve a bit more checking first. But possible. – Michael Cropper Dec 11 '16 at 20:15
  • Possible duplicate of [How to normalize a URL in Java?](http://stackoverflow.com/questions/2993649/how-to-normalize-a-url-in-java) – n247s Dec 11 '16 at 20:37

2 Answers2

0

Fragments do not exist as a separate entity in Java URLs. But you can convert a URL into a URI and back to remove a fragment. I did it like this:

URL url;
...
if (url.toString().contains("#")) {
  URI uri = null;
  try {
    uri = new URI(url.getProtocol(), url.getHost(), url.getPath(), null);
    String file = "";
    if (uri.getPath() != null) {
      file += uri.getPath();
    }
    if (uri.getQuery() != null) {
      file += uri.getQuery();                                       
    }
    url = new URL(uri.getScheme(), uri.getHost(), uri.getPort(), file);
  } catch (URISyntaxException e) {
    ...
  } catch (MalformedURLException e) {
    ...             
  }
}
Dave Moten
  • 11,957
  • 2
  • 40
  • 47
0

Fragment removal is fairly simple using the conversion methods toURI and toURL. So to convert a URL to a URI:

URL url = /*what have you*/ …
URI u = url.toURI();

To remove any fragment from the URI:

if( u.getFragment() != null ) { // Remake with same parts, less the fragment:
    u = new URI( u.getScheme(), u.getSchemeSpecificPart(), /*fragment*/null ); }

In reconstructing a URI from its parts like that, it’s important to use the decoded getters (as shown), not the corresponding raw ones. For authority on this usage, see e.g. the Identity section of the API.

To convert the result back to a URL:

url = u.toURL();
Michael Allan
  • 3,731
  • 23
  • 31