177

In Ruby on Rails 3 (currently using Beta 4), I see that when using the form_tag or form_for helpers there is a hidden field named _snowman with the value of ☃ (Unicode \x9731) showing up.

So, what is this for?

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Matthew Savage
  • 3,794
  • 10
  • 43
  • 53
  • 2
    This is a 'documentation' type Q&A - I tried to find an answer here and ended up digging through the commit messages so I figured I'd share it here for others who are wondering about the snowman... – Matthew Savage Jul 11 '10 at 05:47
  • Also see [this](http://programmers.stackexchange.com/q/168751/37622). – MasterMastic Oct 12 '13 at 15:17

2 Answers2

309

This parameter was added to forms in order to force Internet Explorer (5, 6, 7 and 8) to encode its parameters as unicode.

Specifically, this bug can be triggered if the user switches the browser's encoding to Latin-1. To understand why a user would decide to do something seemingly so crazy, check out this google search. Once the user has put the web-site into Latin-1 mode, if they use characters that can be understood as both Latin-1 and Unicode (for instance, é or ç, common in names), Internet Explorer will encode them in Latin-1.

This means that if a user searches for "Ché Guevara", it will come through incorrectly on the server-side. In Ruby 1.9, this will result in an encoding error when the text inevitably makes its way into the regular expression engine. In Ruby 1.8, it will result in broken results for the user.

By creating a parameter that can only be understood by IE as a unicode character, we are forcing IE to look at the accept-charset attribute, which then tells it to encode all of the characters as UTF-8, even ones that can be encoded in Latin-1.

Keep in mind that in Ruby 1.8, it is extremely trivial to get Latin-1 data into your UTF-8 database (since nothing in the entire stack checks that the bytes that the user sent at any point are valid UTF-8 characters). As a result, it's extremely common for Ruby applications (and PHP applications, etc. etc.) to exhibit this user-facing bug, and therefore extremely common for users to try to change the encoding as a palliative measure.

All that said, when I wrote this patch, I didn't realize that the name of the parameter would ever appear in a user-facing place (it does with forms that use the GET action, such as search forms). Since it does, we will rename this parameter to _e, and use a more innocuous-looking unicode character.

Andrew Marshall
  • 95,083
  • 20
  • 220
  • 214
Yehuda Katz
  • 28,535
  • 12
  • 89
  • 91
  • 1
    If this ends up a transparent parameter like _method, it'll probably be a lot less confusing. What a crazy thing to have to fix, though. – tadman Jul 28 '10 at 15:31
  • 1
    Thanks for the detailed response Yehuda - though I think keeping the snowman is the best outcome its probably one of those stupid things 'enterprises' will pick on - 'what the hell is this snowman thing?!? this is a business, not a game!'.. Ugh. – Matthew Savage Jul 29 '10 at 04:21
  • 1
    @Matthew, oddly enough you're right. But I do feel like the solution is pretty impressive. – JP Silvashy Aug 25 '10 at 19:51
  • 11
    Snowman has since been replaced by a hidden input named utf8 with value set to "✓". I use a form_tag for my language switcher and started to get lots of exceptions because one crawler appears to have problems with this value and incorrectly concatenates the utf8 parameter and its value with the value of a selection option in the form. – Christer Fernstrom May 28 '14 at 16:30
56

This is here to support Internet Explorer 5 and encourage it to use UTF-8 for its forms.

The commit message seen here details it as follows:

Fix several known web encoding issues:

  • Specify accept-charset on all forms. All recent browsers, as well as IE5+, will use the encoding specified for form parameters
  • Unfortunately, IE5+ will not look at accept-charset unless at least one character in the form's values is not in the page's charset. Since the user can override the default
    charset (which Rails sets to UTF-8), we provide a hidden input containing a unicode character, forcing IE to look at the accept-charset.
  • Now that the vast majority of web input is UTF-8, we set the inbound parameters to UTF-8. This will eliminate many cases of incompatible encodings between ASCII-8BIT and
    UTF-8.
  • You can safely ignore params[:_snowman]

In short, you can safely ignore this parameter.

Still, I am not sure why we're supporting old technologies like Internet Explorer 5. It seems like a very non-Ruby on Rails decision if you ask me.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Matthew Savage
  • 3,794
  • 10
  • 43
  • 53
  • 7
    The quotation says “IE5+”, so maybe the problem occurs in newer IE versions, too? – Philipp Jul 11 '10 at 08:24
  • 5
    For a more lengthy response, please take a look at http://github.com/rails/rails/commit/25215d7285db10e2c04d903f251b791342e4dd6a#commitcomment-118076 (also, see my response below) – Yehuda Katz Jul 27 '10 at 22:23