2

I have the following Objective-C code:

[@"http://www.google.com" stringByAddingPercentEncodingWithAllowedCharacters:[NSCharacterSet URLPathAllowedCharacterSet]];
// http%3A//www.google.com

And yet, in Swift:

"http://www.google.com".addingPercentEncoding(withAllowedCharacters: .urlPathAllowed)
// http://www.google.com

To what can I attribute this discrepancy?

..and for extra credit, can I rely on this code to encode for url path reserved characters while passing a full url like this?

Alex Bollbach
  • 4,370
  • 9
  • 32
  • 80

1 Answers1

4

The issue actually rests in the difference between NSString method stringByAddingPercentEncodingWithAllowedCharacters and String method addingPercentEncoding(withAllowedCharacters:). And this behavior has been changing from version to version. (It looks like the latest beta of iOS 11 now restores this behavior we used to see.)

I believe the root of the issue rests in the particulars of how paths are percent encoded. Section 3.3 of RFC 3986 says that colons are permitted in paths except in the first segment of a relative path.

The NSString method captures this notion, e.g. imagine a path whose first directory was foo: (with a colon) and a subdirectory of bar: (also with a colon):

NSString *string = @"foo:/bar:";
NSCharacterSet *cs = [NSCharacterSet URLPathAllowedCharacterSet];
NSLog(@"%@", [string stringByAddingPercentEncodingWithAllowedCharacters:cs]);

That results in:

foo%3A/bar:

The : in the first segment of the page is percent encoded, but the : in subsequent segments are not. This captures the logic of how to handle colons in relative paths per RFC 3986.

The String method addingPercentEncoding(withAllowedCharacters:), however, does not do this:

let string = "foo:/bar:"
os_log("%@", string.addingPercentEncoding(withAllowedCharacters: .urlPathAllowed)!)

Yields:

foo:/bar:

Clearly, the String method does not attempt that position-sensitive logic. This implementation is more in keeping with the name of the method (it considers solely what characters are "allowed" with no special logic that tries to guess, based upon where the allowed character appears, whether it's truly allowed or not.)


I gather that you are saddled with the code supplied in the question, but we should note that this behavior of percent escaping colons in relative paths, while interesting to explain what you experienced, is not really relevant to your immediate problem. The code you have been provided is simply incorrect. It is attempting to percent encode a URL as if it was just a path. But, it’s not a path; it’s a URL, which is a different thing with its own rules.

The deeper insight in percent encoding URLs is to acknowledge that different components of a URL allow different sets of characters, i.e. they require different percent encoding. That’s why NSCharacterSet has so many different URL-related character sets.

You really should percent encode the individual components, percent encoding each with the character set allowed for that type of component. Only when the individual components are percent encoded should they then be concatenated together to form the whole the URL.

Alternatively, NSURLComponents is designed precisely for this purpose, getting you out of the weeds of percent-encoding the individual components yourself. For example:

var components = URLComponents(string: "http://httpbin.org/post")!
let foo = URLQueryItem(name: "foo", value: "bar & baz")
let qux = URLQueryItem(name: "qux", value: "42")
components.queryItems = [foo, qux]

let url = components.url!

That yields the following, with the & and the two spaces properly percent escaped within the foo value, but it correctly left the & in-between foo and qux:

http://httpbin.org/post?foo=bar%20%26%20baz&qux=42

It’s worth noting, though, that NSURLComponents has a small, yet fairly fundamental flaw: Specifically, if you have query values, NSURLQueryItem, that could have + characters, most web services need that percent escaped, but NSURLComponents won’t. If your URL has query components and if those query values might include + characters, I’d advise against NSURLComponents and would instead advise percent encoding the individual components of a URL yourself.

Rob
  • 415,655
  • 72
  • 787
  • 1,044
  • spot on response. How deceptive indeed that what one would think is an bridged and therefore equivalent line of code yields different results. I should have thought to print out the respective `URLPathAllowedCharacterSet` and `urlPathAllowed` sets. – Alex Bollbach Jun 06 '17 at 19:12
  • I tried initializing a `URLComponents` with the string and reconstituting the required segments while percent encoding the relevant one (in this case path). a test remains failing. as e.g., "some example expression" has a space and fails to initialize (to a non nil value) the `URLComponents`. – Alex Bollbach Jun 06 '17 at 19:16
  • When you use `URLComponents`, you don't do any percent escaping. You supply it unencoded strings for the various components and then ask it for the URL, and it will return a percent-encoded URL. It's hard to comment on the specifics without seeing what you tried. – Rob Jun 06 '17 at 19:24
  • I'm using `URLComponents` to break down the URL into the various components. I'm then composing them into a string and picking out the `urlComps.percentEncodedPath` to substitute for the original path. This should have the effect of applying percent encoding only to the path, as this method seems to require. The problem is that I'm being given query parameters with spaces, and those have the `URLComponents` initializer return nil, and so it doesn't work for that case. – Alex Bollbach Jun 06 '17 at 19:27
  • 1
    @AlexBollbach - See `URLComponents` example in revised answer. – Rob Jun 06 '17 at 19:38
  • @AlexBollbach - I dug into this, and the problem is not the character sets. The difference actually rests in the behavior of `NSString` method `stringByAddingPercentEncodingWithAllowedCharacters` (which percent encodes `:` in the first segment of a path even though `:` is in the list of allowed characters). In contrast, the `String` method `addingPercentEncoding(withAllowedCharacters:)` does not do that. See revised answer. – Rob Jun 06 '17 at 21:03