This one is complicated. From this article:
Resolving a domain name
If the string that represents the domain name is not in Unicode, the
user agent converts the string to Unicode. It then performs some
normalization functions on the string to eliminate ambiguities that
may exist in Unicode encoded text.
Normalization involves such things as converting uppercase characters
to lowercase, reducing alternative representations (eg. converting
half-width kana to full), eliminating prohibited characters (eg.
spaces), etc.
Next, the user agent converts each of the labels (ie. pieces of text
between dots) in the Unicode string to a punycode representation. A
special marker ('xn--') is added to the beginning of each label
containing non-ASCII characters to show that the label was not
originally ASCII. The end result is not very user friendly, but
accurately represents the original string of characters while using
only the characters that were previously allowed for domain names.
For example, following domain name:
JP納豆.例.jp
converts to next representation:
xn--jp-cd2fp15c.xn--fsq.jp
You can use following code to perform this conversion.
Resolving a path
If the string is input by the user or stored in a non-Unicode
encoding, it is converted to Unicode, normalized using Unicode
Normalization Form C, and encoded using the UTF-8 encoding.
The user agent then converts the non-ASCII bytes to percent-escapes.
For example, following path:
/dir1/引き割り.html
converts to next representation:
/dir1/%E5%BC%95%E3%81%8D%E5%89%B2%E3%82%8A.html
For this purpose, you may use following code:
path = [URL.path stringByAddingPercentEncodingWithAllowedCharacters:[NSCharacterSet URLPathAllowedCharacterSet]];
Note that stringByAddingPercentEscapesUsingEncoding:
is deprecated, because each URL component or subcomponent has different rules for what characters are valid.
Putting it all together
Resulting code:
@implementation NSURL (Normalization)
- (NSURL*)normalizedURL {
NSURLComponents *components = [NSURLComponents componentsWithURL:self resolvingAgainstBaseURL:YES];
components.host = [components.host IDNAEncodedString]; // from https://github.com/OnionBrowser/iOS-OnionBrowser/blob/master/OnionBrowser/NSStringPunycodeAdditions.h
components.path = [components.path stringByAddingPercentEncodingWithAllowedCharacters:[NSCharacterSet URLPathAllowedCharacterSet]];
return components.URL;
}
@end
Unfortunately, actual URL "normalization" is more complicated - you need to handle all remaining URL components too. But I hope I've answered your question.