1

Good day to all of you readers and helpers, i want to make use of a javascript function i recently found, its LZW compressing a string.

function lzw_encode(s) {
    var dict = {};
    var data = (s + "").split("");
    var out = [];
    var currChar;
    var phrase = data[0];
    var code = 256;
    for (var i=1; i<data.length; i++) {
        currChar=data[i];
        if (dict[phrase + currChar] != null) {
            phrase += currChar;
        }
        else {
            out.push(phrase.length > 1 ? dict[phrase] : phrase.charCodeAt(0));
            dict[phrase + currChar] = code;
            code++;
            phrase=currChar;
        }
    }
    out.push(phrase.length > 1 ? dict[phrase] : phrase.charCodeAt(0));
    for (var i=0; i<out.length; i++) {
        out[i] = String.fromCharCode(out[i]);
    }
    return out.join("");
}

This function is actually working very well, the only problem ive got is that i want to transfer the encoded string via websockets and without additional encoding (base64 e.g.) but that doesnt work every time. Sometimes the compressed string is having chars which cant be transfered via websockets, it throws an javascript error that the string is having illegal chars. So my idea was to only use acceptable chars in the encoding process, like a "whitelist" of chars which should be used for compression. What i understood from the code is that its taking the charCode of some number, so i though i could just create my own charCodeSet but i dont really know how to implement it and if it would even work.

  • Q1: what can i do so my lzw encoding just uses chars of a string that i define?
  • Q2: how else could i "http/s" transfer these chinese, arabic and control chars which websocket dont want to transfer?

By the way, this is the error which Chrome is throwing:

Websocket message contains invalid character(s).
Uncaught Error: SYNTAX_ERR: DOM Exception 12 

Update1: though its might helpfull if you see the decoding function aswell

function lzw_decode(s) {
    var dict = {};
    var data = (s + "").split("");
    var currChar = data[0];
    var oldPhrase = currChar;
    var out = [currChar];
    var code = 256;
    var phrase;
    for (var i=1; i<data.length; i++) {
        var currCode = data[i].charCodeAt(0);
        if (currCode < 256) {
            phrase = data[i];
        }
        else {
           phrase = dict[currCode] ? dict[currCode] : (oldPhrase + currChar);
        }
        out.push(phrase);
        currChar = phrase.charAt(0);
        dict[code] = oldPhrase + currChar;
        code++;
        oldPhrase = phrase;
    }
    return out.join("");
}

here i would have to implement my custom charset also, i guess?..

beist87
  • 43
  • 4

1 Answers1

1

Determine what bytes you can and cannot send. (Hopefully from a reliable source of documentation, as opposed to testing, but verified with testing.)

Design an escape code where you use one of the valid characters as an escape character, and the next character, also one of the valid characters, encodes a byte you cannot send.

Apply that to the output of your compressor. It is best to leave the job of compression to the compressor, and not try to saddle it with encoding. You should encode as a separate step.

Don't use LZW. It is ineffective and obsolete as compared to modern methods (zlib, lz4, lzma, etc.)

Mark Adler
  • 101,978
  • 13
  • 118
  • 158
  • i apreciate your advice and work very much, the reason why i was using lzw is the simplicity as a function and the high compression rate, like that i reach ratios of 50% smaller resulting image chunks, i do this compression on image upload and on the serverside i decompress again, because in my application i want to give the possibility to upload various images at once. doing the compression speeded the whole process up a lot and is smoothining everything a little, i calculated that i still gain time uploading with compressing before. i did not found any "simple" to use zlib javascript lib – beist87 Mar 14 '13 at 21:56
  • right now i am using huffman compression, but with huffman i am just able to compress like a max of 30%. do you think its overall a bad idea to let the client compress something before uploading it? i thought its kind of innovative – beist87 Mar 14 '13 at 21:58