15

So I'm trying to use a Rails URL helper (page_url) to create URLs that contain special characters, including ampersands. Most cases work like you'd expect them to:

(rdb:1) page_url('foo', :host => 'host')
"http://host/pages/foo"
(rdb:1) page_url('foo_%_bar', :host => 'host')
"http://host/pages/foo_%25_bar"

But for some odd reason, ampersands are not escaped:

(rdb:1) page_url('foo_&_bar', :host => 'host')
"http://host/pages/foo_&_bar"

And if I pre-escape them, they get corrupted:

(rdb:1) page_url('foo_%26_bar', :host => 'host')
"http://host/pages/foo_%2526_bar"

CGI::escape, on the other hand, escapes them fine:

(rdb:1) CGI::escape('foo_&_bar')
"foo_%26_bar"

What's going on, and how do I work around this? (With something nicer than gsub('&', '%26'), that is.)

Makoto
  • 104,088
  • 27
  • 192
  • 230
lambshaanxy
  • 22,552
  • 10
  • 68
  • 92

2 Answers2

18

I can't tell you a nicer way to deal with it - but I can explain why it's happening.

Ampersands are not invalid characters for a URL. Otherwise you'd have problems with: "http://host/pages/foo?bar=baz&style=foo_style" or whatever.

Edit: Digging deeper into the source code, it looks like Rails uses CGI.escape only on parameters.

The helper, url-generators use url_for (under the covers), which eventually calls: http://apidock.com/rails/ActionController/Routing/Route/generate Which calls stuff deep in the sprivate-methods of the source code... but eventually ends up calling CGI.escape (first look in actionpack/lib/action_controller/routing/route.rb then in actionpack/lib/action_controller/routing/segments.rb )

End result is that on the url itself, rails uses URI.escape - which notably does not update ampersands at all:

>> CGI.escape('/my_foo_&_bar')
=> "%2Fmy_foo_%26_bar"
>> URI.escape('/my_foo_&_bar')
=> "/my_foo_&_bar"

There's currently nothing you can do about this without putting an actual feature-request onto the rails team.

...unless you have the option to choose not to use ampersands in your URLs You can always gsub them out yourself for all URLs:

def my_clean_url(the_url)
   return the_url.gsub(/&/,'_')
end
>> my_clean_url('/my_foo_&_bar')
=> "/my_foo___bar"

page_url(my_clean_url('/my_foo_&_bar'))
Taryn East
  • 27,486
  • 9
  • 86
  • 108
  • 1
    The issue is that the & is used by the underlying (non-Ruby) app as a normal character, but many URL parsers see it as a parameter delimiter and chop up the URL when they see it. Encoding it makes sure it stays intact. – lambshaanxy Mar 28 '11 at 04:05
  • 1
    Great answer. Thanks Taryn! I did a quick experiment by creating a new Rails 3 project with a simple Car resource and ran the ascii characters through it to see what gets encoded by Rails when I call car_path(some_ascii_char). ie. (1..256).map {|car| app.car_path(car.chr)} I found that the everything gets encoded except ! $ % & ' - , + * ( ) / : ; = @ . _ ~ Hence only % / ? & are likely to be problematic and need to be manually encoded. – Declan McGrath Feb 14 '12 at 12:43
  • How about just calling: CGI.unescape(page_url('foo_%_bar', :host => 'host')) – Boris Jun 27 '13 at 15:16
11

For all those who are trying to just encode anything other than a-z, A-Z, 0-9 and underscore:

URI.encode(string, /\W/)

Say you have some content which may contain e.g. ampersands and you want to use this content as body parameter for a mailto link: Without /\W/, the ampersand (which is a safe URI character) would not be encoded and therefore partially break the link.

svoop
  • 3,318
  • 1
  • 23
  • 41