3

Some pieces of our app are written in Ruby and others are written using node.js.

We share data among them using a redis store that stores zlib chunks. We write to it with the following code using node:

zlib.deflate(xml.toString(), function(error, deflated) {
  ...
  deflated.toString('binary'); // That's the string we write in Redis
  ...
});

Now, we read this data in the redis store using Ruby (1.8.7) and I have to say I'm not sure how to do that.

The typical string we get from the store looks like this:

=> "xuAo \020ÿ\ná.£v½\030dÿCO½±:«¤(\004ƪÿ¾¬®5MÚ\003÷½IÞ q¤°²e°c¼òÈ×\000ó<ùM¸ÐAç\025ÜÈ\r|gê\016Ý/.é\020ãÆî×\003Ôç<Ýù2´F\n¨Å\020!zl \0209\034p|üÀqò\030\036m\020\e`\031¼ÏütÓ=ø¦U/ÔO±\177zB{\037½£-ðBu©ò¢X\000kb­*Ó[V\024Y^½EÎ¥üpúrò­¦\177ÁÃdÈ¢j\0353$a\027²q#¥]*Ýi3J8¤´füd\eså[³öʵ%\fcÇY\037ð¬ÿg§í^¥8£Õ§a¶\001=\r;¡¾\001\020Pí" 

Of course, I tried using Zlib::Inflate.new.inflate(compressed) but that fails with a Zlib::DataError: incorrect header check.

Any idea on what kind of transformation we should do to that string to inflate it from Ruby?

PS: inflating it from node is easy and works, so the problem is not how we compress it.

Julien Genestoux
  • 31,046
  • 20
  • 66
  • 93

3 Answers3

5

Any idea on what kind of transformation we should do to that string to inflate it from Ruby?

UTF-8 to Latin-1

Ideally, there would be no need for any transformation, as long as you work with Buffers directly on the Node side. See pair of Node and Ruby code blocks at the very bottom below; however, the nature of the question is about what can be done on the Ruby side alone to address this.

Ruby-only - Convert from UTF-8 to LATIN-1

require 'zlib'
require 'rubygems'
require 'redis'
require 'iconv'

redis = Redis.new

def inflate(buffer)
    zstream = Zlib::Inflate.new
    buf = zstream.inflate(buffer)
    zstream.finish
    zstream.close
    buf
end


def convert(buffer)
    utf8_to_latin1 = Iconv.new("LATIN1//TRANSLIT//IGNORE", "UTF8")
    utf8_to_latin1.iconv(buffer) 
end

value = redis.get("testkey")
value = convert(value)
puts inflate(value);

Explanation

The above code uses iconv to convert the value retrieved from Redis from UTF-8 back to the intended bytes.

When deflating in Node, the resulting buffer contains the correct zlib generated bytes; the result string from toString('binary'), character for character matches the contents of the deflate result buffer as well; however, by the time the deflate result is stored in Redis, it is UTF-8 encoded. An example:

deflating the string "ABCABC" results in:

<Buffer 78 9c 73 74 72 76 74 72 06 00 05 6c 01 8d>

Yet, Redis returns:

<Buffer 78 c2 9c 73 74 72 76 74 72 06 00 05 6c 01 c2 8d>

Hypothesizing a bit, it would seem that the string resulting from toString('binary') ends up as argument to new Buffer(...) somewhere, perhaps in node-redis. In the absence of a specified encoding argument to new Buffer(), the default UTF-8 encoding is applied. (See first reference). Further hypothesizing, by using only buffers you avoid the need to create a buffer from the string, and as a result, avoid the UTF8 encoding, and so the correct deflate values make it in and out of Redis.

References

Node

var zlib = require('zlib');
var redis = require("redis").createClient();

var message = new Buffer('your stuff goes here.');
//var message = new Buffer(xml.toString());

redis.on("error", function (err) {
console.log("Error " + err);
});

redis.on("connect", function() {
    console.log(message);
    zlib.deflate(message, function(error, deflated) {
        console.log(deflated);          
        redis.set("testkey",deflated,function (err, reply) {
            console.log(reply.toString());
        });
    });
});

Ruby

require 'zlib'
require 'rubygems'
require 'redis'

redis = Redis.new

def inflate(buffer)
    zstream = Zlib::Inflate.new
    buf = zstream.inflate(buffer)
    zstream.finish
    zstream.close
    buf
end

value = redis.get("testkey")    

puts inflate(value)
Kevin Viggers
  • 761
  • 6
  • 8
2

If you are using node-redis for saving the data then it will deal with Buffers directly so you can simply client.set(key, buff) or client.append(key, buff), so you don't need (want) to do any conversion.

Node.js (simplified from Kevin)

var zlib = require('zlib');
var redis = require("redis");
var rc = redis.createClient(null, null, {detect_buffers: true}); // allow Buffers

var message = new Buffer('My message');

zlib.deflate(message, function (err, deflated) {
  if (err) return console.error(err);
  rc.set("testkey", deflated, function (err, result) {
    if (err) return console.error(err);
    rc.quit();
  });
});

Ruby code (copied from Kevin above)

require 'zlib'
require 'rubygems'
require 'redis'

redis = Redis.new

def inflate(buffer)
    zstream = Zlib::Inflate.new
    buf = zstream.inflate(buffer)
    zstream.finish
    zstream.close
    buf
end

value = redis.get("testkey")

puts inflate(value)

That works in retrieving the value properly, but changing the Node.js code to use .toString('binary') like you mention originally breaks the Ruby decoding like you said above.

Here is an example to show that toString('binary') does mess with the data

 console.log(deflated);
 console.log(new Buffer(deflated.toString('binary')));

So I can't figure out what transformation Buffer.toString('binary') is doing since I believe it goes into the V8 Buffer code.

But if you are still able to read it with Node, then you might want to extract it back out and save it the proper way without using the .toString('binary') just give the Buffer to the redis client set method and it will save it properly.

Then it will be stored as binary and you can read it with ruby correctly using code like above.

As for your node.js code, once you have it saving as binary properly (using Buffer directly in set call), then to retrieve it:

var rc = redis.createClient(null, null, {detect_buffers: true}); // allow Buffers
rc.get(new Buffer(key), function (err, buff) {  // use a Buffer for the key
   // buff is a Buffer now
});

By having detect_buffers turned on for node-redis, then when you pass a Buffer in as a key, then it will retrieve as a Buffer and won't convert.

You could alternatively used return_buffers = true option, but I like detect_buffers so you can use the same client for both Buffer and non-Buffer data.

PS. Make sure for your Ruby gem that you are using one of the latest versions, not an old one like 1.x (2.x added binary fixes).

Jeff Barczewski
  • 446
  • 3
  • 7
  • Thanks a lot Jeff! That's actually exactly how I went. Unfortunately at this point, we cannot change the code that "writes" (the node code), so I had to find what transformation was needed on the string to get the right value back. Kevin nailed it. Thanks for your precious help though. – Julien Genestoux Nov 15 '12 at 13:31
0

The act of converting using toString has already put you in a state of sin. You need to preserve and transmit the original binary buffer produced by deflate with no conversion of any kind in order for the inflate in Ruby to be able to decode it.

It is not clear what conversion 'binary' does, but it probably strips nulls, which would mess up the data. In any case, the documentation say that binary should not be used and is being deprecated. You need find a way to pass on the original deflated data in the Buffer class directly, or if you really need a string, convert it a string format that you can reverse in Ruby before trying to inflate. E.g. base64.

Mark Adler
  • 101,978
  • 13
  • 118
  • 158
  • Thanks Mark. I'm tempted to think that the data is not corrupted, because I'm actually able to read it again from a Node.js app. Also, AFAIK, we need to call toString(), because as I've stated, we store these buffers into redis, which only accepts strings :/ – Julien Genestoux Nov 06 '12 at 14:20
  • Also, we picked binary because it was relatively "cheap". We need to have the string stored be the smallest possible. – Julien Genestoux Nov 06 '12 at 15:00
  • How about you just take a buffer of the bytes 0 to 255, do the "binary" conversion, and see what comes out? – Mark Adler Nov 07 '12 at 05:12