0

I'm having trouble while implementing the compressor of the LZW. The compressor seems to work fine but while processing some streams it doesn't put the end of stream character (defined with the value 256), the result is that the decompressor will loop infinitely. The code of the compressor is the following:

int compress1(FILE* input, BIT_FILE* output) {
CODE next_code;         // next node
CODE current_code;      // current node
CODE index;             // node of the found character
int character;
int ret;

next_code = FIRST_CODE;

dictionary_init();

if ((current_code = getc(input)) == EOF)
    current_code = EOS;

while ((character = getc(input)) != EOF) {
    index  = dictionary_lookup(current_code, (SYMBOL)character);
    if (dictionary[index].code != UNUSED) {
        current_code = dictionary[index].code;
    }
    else {
        if (next_code <= MAX_CODE-1) {
            dictionary[index].code = next_code++;
            dictionary[index].parent = current_code;
            dictionary[index].symbol = (SYMBOL)character;
        }
        else {
            // handling full dictionary
            dictionary_init();
            next_code = FIRST_CODE;
        }
        ret = bit_write(output, (uint64_t) current_code, BITS);
        if( ret != 0)
            return -1;

        current_code = (CODE)character;
    }
}
ret = bit_write(output, (uint64_t) current_code, BITS);
if (ret != 0)
    return -1;

ret = bit_write(output, (uint64_t) EOS, BITS);
if (ret != 0)
    return -1;

if (bit_close(output) == -1) {
    printf("Ops: error during closing\n");
    return -1;
}

return 0;
}

CODE and SYMBOL are typedef of, respectively, uint32_t and uint16_t, FIRST_CODE is defined as 257. The funtion dictionary_init() simply initializes the dictionary, dictionary_lookup() returns the index of a child having symbol "character" of the parent node "current_node" (if it exists).

The writing of the binary file is defined as:

int bit_write(BIT_FILE* bf, uint64_t data, int len)
{
int space, result, offset, wbits, udata;
uint64_t* p;
uint64_t tmp;

udata = (int)data;

if (bf == NULL || len < 1 || len > (8* sizeof(data)))
    return -1;

if (bf->reading == true)
    return -1;

while (len > 0) {
    space = bf->end - bf->next;
    if (space < 0) {
        return -1;
    }
    // if buffer is full, flush data to file and reinit BIT_IO struct
    if (space == 0) {
        result = bit_flush(bf);
        if (result < 0)
            return -1;
    }

    p = bf->buf + (bf->next/64);
    offset = bf->next % 64;
    wbits = 64 - offset;

    if (len < wbits)
        wbits = len;

    tmp = le64toh(*p);
    tmp |= (data << offset);
    *p = htole64(tmp);

    bf->next += wbits;
    len -= wbits;
    data >>= wbits;
}

return 0;
}

I already opened the file using another function, so the bit_write take as input the pointer to the bf structure. Can someone help me finding the error?

An example of when this problem arises is the following:

If the input string is "Nel mezzo del cammi" everything works fine and I have the following compressed file (in Hexadecimal, using 12 Bits for encoding symbols):

4E 50 06 6C 00 02 6D 50 06 7A A0 07 6F 00 02 64
20 10 20 30 06 61 D0 06 6D 90 06 0D A0 00 00 01

If I add another character to the string, in particular "Nel mezzo del cammin", I have the following result:

4E 50 06 6C 00 02 6D 50 06 7A A0 07 6F 00 02 64
20 10 20 30 06 61 D0 06 6D 90 06 6E D0 00 0A 00
10

In the second case it doesn't write the End of Stream correctly.

SOLUTION: check that there is enough space in the buffer for the whole coded symbol I am going to write. Just change:

if (space == 0)

to:

if(space == 0 && space < len)
damaar
  • 171
  • 1
  • 2
  • 11
  • 1
    Is it stable problem? I.e. do you have the same result "no end of stream character" on the same inputs? If yes, show us example of problem input. – Ilya May 20 '15 at 12:02
  • Does `bit_write` return -1 in the cases when it doesn't put EOS? – dragi May 20 '15 at 15:25
  • I added an example of execution to show you the compressed file. `bit_write` return -1 in case of errors during file processing (open, close, flush, etc.), but not in the case you mentioned. – damaar May 23 '15 at 07:59
  • I finally find the solution to this problem. It was that I have to check that there is enough space in the buffer for an entire coded symbol, otherwise it will split the symbol and it can't be recovered anymore. So a little change to make it works: `if (space == 0 && space < len)` – damaar May 23 '15 at 14:50

0 Answers0