4

While poking around with the Uri class answering another question, I found something that seems strange to me:

Consider these two Uris:

var u1 = new Uri("http://a.b:33/abc%2fdef/c?d=f");
var u2 = new Uri("foobar://a.b:33/abc%2fdef/c?d=f");

They differ only by their scheme. All other elements of the supplied identifiers are the same.

So, why, when I dump the Segments property of these Uri instances, do I see the following output for u1:

/ 
abc/ 
def/ 
c 

...but a different output for u2?

/ 
abc%2fdef/ 
c 

Why is the the parsing behaviour different for different schemes?

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
spender
  • 117,338
  • 33
  • 229
  • 351

1 Answers1

5

The Uri Class uses different parsers for different URI schemes. For example, for http and https URIs, it uses a HttpStyleUriParser, while for ftp URIs it uses an FtpStyleUriParser, and so on. URIs with unknown schemes are parsed by a GenericUriParser. You can register new schemes using the UriParser.Register Method.

UriParser.Register(new HttpStyleParser(), "foobar", 33);
dtb
  • 213,145
  • 36
  • 401
  • 431
  • Good answer. +1. Can you tell me what requirements make these "specialized" schemes non-generic? I was under the impression that a URI is a URI and that one might reasonably expect them to be treated the same. – spender Apr 19 '13 at 15:02
  • All URIs conform to [RFC 3986](http://tools.ietf.org/html/rfc3986), but not all schemes take advantage of all features, or they put more restrictions on the URI syntax. The UriParser stuff in the .NET Framework is quite ugly under the hood; I recommend you take your favourite decompiler and take a look yourself. – dtb Apr 19 '13 at 15:09
  • Thanks. I was looking at [RFC1738](http://www.ietf.org/rfc/rfc1738.txt) section 3.3 which doesn't really come down one way or the other regarding **escaped** slashes. The presence in Apache of the [AllowEncodedSlashes Directive](http://httpd.apache.org/docs/2.2/mod/core.html#allowencodedslashes) suggests that they expect that escaped-slashes might be considered. This implies to me that I'm seeing overjudicious unescaping and the Uri behaviour is incorrect, but maybe I'm reading the runes incorrectly. – spender Apr 19 '13 at 15:18
  • First, RFC 1738 is ancient. The most up-to-date specification of the http/https URI schemes are in [draft-ietf-httpbis-p1-messaging](http://tools.ietf.org/html/draft-ietf-httpbis-p1-messaging-22). (The draft is stable at the time of writing, but hasn't been published as RFC yet. Technically speaking [RFC 2616](http://tools.ietf.org/html/rfc2616) is the current specification.) Second, your are right. `%2F` in a path segment should not be decoded to a `/`. But the HttpStyleUriParser probably does this for the same reason as there is a AllowEncodedSlashes Directive in Apache. – dtb Apr 19 '13 at 15:28
  • 1
    Overall, there seem to be multiple versions of URI parsers in the .NET Framework (at least 3), and these seem to support various "quirks" for different URI schemes. If you need a standards-conform URI parser, you should check MSDN for hints or customize your own UriParser. **Update:** Apparently there is a [DontUnescapePathDotsAndSlashes](http://msdn.microsoft.com/en-us/library/ee656542.aspx) option that you can set for *http* and *https* URIs in your *App.config*. – dtb Apr 19 '13 at 15:29