I want to create an instance of java.net.URI using individual URI components, namely:
- scheme
- userInfo
- host
- port
- path
- query
- fragment
There is a constructor in java.net.URI class that allows me to do it, here is a code from the library:
public URI(String scheme,
String authority,
String path, String query, String fragment)
throws URISyntaxException
{
String s = toString(scheme, null,
authority, null, null, -1,
path, query, fragment);
checkPath(s, scheme, path);
new Parser(s).parse(false);
}
This constructor will also encode path, query, and fragment parts of the URI, so for example if I pass already encoded strings as arguments, they will be double encoded.
JavaDoc on this function states:
- If a path is given then it is appended. Any character not in the unreserved, punct, escaped, or other categories, and not equal to the slash character ('/') or the commercial-at character ('@'), is quoted.
- If a query is given then a question-mark character ('?') is appended, followed by the query. Any character that is not a legal URI character is quoted.
- Finally, if a fragment is given then a hash character ('#') is appended, followed by the fragment. Any character that is not a legal URI character is quoted.
it states that unreserved punct and escaped characters are NOT quoted, punct characters include:
!
#
$
&
'
(
)
*
+
,
;
=
:
According to RFC 3986 reserved characters are:
reserved = gen-delims / sub-delims
gen-delims = ":" / "/" / "?" / "#" / "[" / "]" / "@"
sub-delims = "!" / "$" / "&" / "'" / "(" / ")" / "*" / "+" / "," / ";" / "="
So, if characters @
, /
and +
are reserved, and should always be encoded (or I'm I missing something?), according to the most up to date RFC on URIs, then why does java.net.URI JavaDoc states that it will not encode punct characters (which includes +
and =
), @
and /
?
Here is a little example I ran:
String scheme = "http";
String userInfo = "username:password";
String host = "example.com";
int port = 80;
String path = "/path/t+/resource";
String query = "q=search+term";
String fragment = "section1";
URI uri = new URI(scheme, userInfo, host, port, path, query, fragment);
uri.toString // will not encode `+` in path.
I don't understand, if this is correct behavior and those characters indeed don't need to be encoded, then why are they referred to as "reserved" in an RFC? I'm trying to implement a function that will take a whole URI string and encode it (hence extract path, query, and fragment, encode reserved characters in them, and put the URI back together).