I have a rails server getting input from an android app. This android app is passing some data to the server in JSON format, but current version of the app does not encode the string to UTF-8, so it remains 'binary' with e.g. '\xE0' instead of '\uE0'. But when sending this binary string to the server, the app sets the http connection to encoding UTF-8, which means that the webserver receiving this string thinks, that is is encoded in UTF-8, but indeed it is encoded 'binary'.
I resolved this partly with the following line:
# encode comment to UTF-8 and strip whitespace from comment field
params[:data][:text].encode('utf-8', 'binary', :invalid => :replace, :undef => :replace).gsub!(/\s+/, " ")
This was taken from here: Ruby String.encode still gives "invalid byte sequence in UTF-8"
It works fine on my development system with starting the build in webserver WEBrick, but unfortunatly this gives a different behavior on my production system running on Apache / Passenger, which does not replace wrong characters with '?' but breaks at the first invalid one.
On WEBrick I get
' so un\xE4hnlich ' => 'so un?hnlich'
On Apache with the same code, ruby (1.9.3) and rails (3.1.1) version, I get
' so un\xE4hnlich ' => 'so un'
There must be something I can do, I don't know where to try: On apache config, on code, on ruby bundles...?