Ruby on Rails: how to handle wrong encoded JSON string on receive

Question

I have a rails server getting input from an android app. This android app is passing some data to the server in JSON format, but current version of the app does not encode the string to UTF-8, so it remains 'binary' with e.g. '\xE0' instead of '\uE0'. But when sending this binary string to the server, the app sets the http connection to encoding UTF-8, which means that the webserver receiving this string thinks, that is is encoded in UTF-8, but indeed it is encoded 'binary'.

I resolved this partly with the following line:

# encode comment to UTF-8 and strip whitespace from comment field
params[:data][:text].encode('utf-8', 'binary', :invalid => :replace, :undef => :replace).gsub!(/\s+/, " ")

This was taken from here: Ruby String.encode still gives "invalid byte sequence in UTF-8"

It works fine on my development system with starting the build in webserver WEBrick, but unfortunatly this gives a different behavior on my production system running on Apache / Passenger, which does not replace wrong characters with '?' but breaks at the first invalid one.

On WEBrick I get

'  so un\xE4hnlich   ' => 'so un?hnlich'

On Apache with the same code, ruby (1.9.3) and rails (3.1.1) version, I get

'  so un\xE4hnlich   ' => 'so un'

There must be something I can do, I don't know where to try: On apache config, on code, on ruby bundles...?

Interesting is, that even with Apache the logfile contains the whole text with escaped special characters. It would really be enough for me, if I could just put it to the database like this. — Schlangi, Mar 11 '13 at 08:59

score 0 · Accepted Answer · answered Mar 14 '13 at 09:34

Fortunately, I found the answer to this tricky issue - and it turned out, that it was a problem on database level!
Suprisingly rake db:create:all statement did create the production db WITH encoding UTF-8 like configured in database.yml, but it did with (MySQL default) LATIN-1 for the development database, although this is also marked to use UTF-8 in the same database.yml file.
For future users with this problem: check out your database.yml file for encoding of your database.

development:
  adapter: mysql2
  encoding: utf8
[...]

test:
  adapter: mysql2
  encoding: utf8
[...]

production:
  adapter: mysql2
  encoding: utf8
[...]

I could understand the full encoding stuff thanks to this blog: http://yehudakatz.com/2010/05/05/ruby-1-9-encodings-a-primer-and-the-solution-for-rails/

The code in the question works fine, if you understand, what encoding to send to the database...

Ruby on Rails: how to handle wrong encoded JSON string on receive

1 Answers1