0

I have compressor/decompressor for LZW. Now i want to use custom variable(size) to keep dictionary size in bytes and then it becomes large enough i revert dictionary to initial state and start fill it again.

But i dont realize how to implement it. I had several tries.

  • First time i flushed dict then it had maxSize elements, however it may become to large.
  • Next time i increased size by size of new element in dict, but program took several Gb from memory and broke everything at the end.

Coder

void lzwCodeFile(istream &in, ostream &out, uint32_t maxSize) {
   unordered_map<string, uint32_t> mp;
   clearAndFill(mp);
   uint32_t code = 256;
   uint32_t size = 256;
   string w = "";
   string tmp;
   uint8_t c;
   while (in.read((char *) &c, sizeof c)) {
      tmp = w;
      tmp += c;
      if (mp.find(tmp) != mp.end()) {
         w = tmp;
      } else {
         uint32_t val = mp[w];
         out.write((const char *) &val, sizeof(mp[w]));
         //cout << val << " ";
         mp[tmp] = code++;
         size += tmp.size();
         w = c;
      }
  }
  if (w.size()) {
      uint32_t val = mp[w];
      out.write((const char *) &val, sizeof(mp[w]));
      //cout << val << " ";
  }  
}

Decoder

void lzwDecodeFile(istream &in, ostream &out, uint32_t maxSize) {
    unordered_map<uint32_t, string> mp;
    uint32_t code = 256;
    uint32_t size = 256;
    clearAndFillRev(mp);
    string tmp, w;
    uint32_t k;
    in.read((char *) &k, sizeof k);
    w = "" + mp[k];
    string entry;
    out << w;
    while (in.read((char *) &k, sizeof k)) {
        // finded
        if (mp.find(k) != mp.end()) {
            entry = mp[k];
        } else {
            entry = w + w[0];
        }
        out << entry;
        tmp = w + entry[0];
        mp[code++]=tmp;
        w = entry;
    }
}
Ivnsky
  • 3
  • 1
  • 1
    `First time i flushed dict then it had maxSize elements, however it may become to large` I guess you mean that sum of mp[i].size was too large. That's because you store every pattern as is. The usual trick is not to store a full (abcd) string, but only a pair (codeof(abc), d). This way every entry in your LZW table has a fixed length. – Matt Jul 04 '17 at 12:50
  • Thanks, nice idea ill try to use it in my project – Ivnsky Jul 04 '17 at 14:58
  • @Ivnsky: Basically, you create a binary tree of such pairs mentioned by Matt. Your dictionary is just an array of tree nodes, which you link during the encoding/decoding process. – SBS Aug 05 '17 at 17:30

0 Answers0