Sinatra binary return for msgpack -- charset issue/ characters being converted somewhere?

Question

I'm currently trying to return msgpack http://msgpack.org/ from a ruby sinatra service and parse it using javascript. I am using the javascript library found here: https://github.com/uupaa/msgpack.js/ (though I don't think that's relevant to this question).

I have a sinatra service that does the following using the msgpack gem:

require 'sinatra'
require 'msgpack'

get '/t' do
  content_type 'application/x-msgpack'
  { :status => 'success', :data => {:one => "two", :three => "four"}}.to_msgpack
end

I have javascript that reads it as follows:

<script src="js/jquery.js"></script>
<script src="js/msgpack.js"></script>
<script type="text/javascript">

    function r() {
        $.ajaxSetup({
            converters: {
                "text msgpack": function( packed ) {
                    if(packed != '') {
                        unpacked = msgpack.unpack(packed);
                        return unpacked;
                    }else{
                        return ''
                    }
                }
            }
        });

        $.ajax({
            type: "GET",
            url: "/t",
            dataType: "msgpack",
            success: function(data) {
                alert(data)
            }
        })  
    }
    $(document).ready(r)
</script>

The problem is that when I get the data back, many characters have been converted from their server side version to 0xfffd.

I then tried the two variants:

content_type 'application/octet-stream'

and

content_type 'application/octet_stream', :charset => 'binary'

on the server side. The former didn't change anything but the latter came closer, leaving most of the message untouched with one exception: the first character was converted from 0x82 to 0x201a.

I suspect that there is a combination of charset/ content types that would fix this that I haven't tried yet. I could also always fall back to Base64, but I'd like to understand what it takes to get it working without Base64 first.

ruby 1.9.2p290 (2011-07-09 revision 32553) [i686-linux] -- on ubuntu 10.10 64bit. Also I'm using Sinatra 1.2.6 with Rack 1.3.2 hosted using thin 1.2.11. — Michael Wasser, Oct 26 '11 at 17:38
This said, I just tried getting a response using net-http -- it doesn't appear that the conversion is happening server side. — Michael Wasser, Oct 26 '11 at 17:45

gioele · Accepted Answer · 2011-10-26T17:49:59.550

1

0x82 is LOW QUOTATION MARK in Latin1, 0x201a is the same character in UTF-16. Have a look at how your libraries deal with encoding, tell them to use a binary encoding and not try any conversion between encodings.

UTF-16 smells of JavaScript. If you use jQuery, have a look at http://blog.vjeux.com/2011/javascript/jquery-binary-ajax.html.

edited Oct 26 '11 at 17:49

answered Oct 26 '11 at 17:21

gioele

9,748
5
55
80

So now I'm just trying to make it work without JQuery (using the link you sent as well as https://developer.mozilla.org/En/XMLHttpRequest/Using_XMLHttpRequest#Receiving_binary_data_using_JavaScript_typed_arrays). However, the more i dig into this the more the best solution seems to be to just convert to Base64 so that there are fewer browser compatibility issues... I'm in relatively unfamiliar territory so I believe it will be more maintainable as well. -- Thanks for the Help! – Michael Wasser Oct 26 '11 at 18:12
Actually the real problem is that few library developers are familiar with the problem. Encodings bites a lot these days. :) – gioele Oct 26 '11 at 19:48
1

Sinatra does not touch the body's encoding. – Konstantin Haase Oct 27 '11 at 06:24

Sinatra binary return for msgpack -- charset issue/ characters being converted somewhere?

1 Answers1