0

The LZW compression algorithm is increasing the size in bits after compression:

Here is the code for Compression function:

// compression
void compress(FILE *inputFile, FILE *outputFile) {    
    int prefix;
    int character;

    int nextCode;
    int index;

    // LZW starts out with a dictionary of 256 characters (in the case of 8 codeLength) and uses those as the "standard"
    //  character set.
    nextCode = 256; // next code is the next available string code
    dictionaryInit();

    // while (there is still data to be read)
    while ((character = getc(inputFile)) != (unsigned)EOF) { // ch = read a character;

        // if (dictionary contains prefix+character)
        if ((index = dictionaryLookup(prefix, character)) != -1) prefix = index; // prefix = prefix+character
        else { // ...no, try to add it
            // encode s to output file
            writeBinary(outputFile, prefix);

            // add prefix+character to dictionary
            if (nextCode < dictionarySize) dictionaryAdd(prefix, character, nextCode++);

            // prefix = character
            prefix = character; //... output the last string after adding the new one
        }
    }
    // encode s to output file
    writeBinary(outputFile, prefix); // output the last code

    if (leftover > 0) fputc(leftoverBits << 4, outputFile);

    // free the dictionary here
    dictionaryDestroy();
}

Where the writeBinary (It acts like a buffer in the program) function is as follows:

void writeBinary(FILE * output, int code);

int leftover = 0;
int leftoverBits;

    void writeBinary(FILE * output, int code) {
        if (leftover > 0) {
            int previousCode = (leftoverBits << 4) + (code >> 8);

            fputc(previousCode, output);
            fputc(code, output);

            leftover = 0; // no leftover now
        } else {
            leftoverBits = code & 0xF; // save leftover, the last 00001111
            leftover = 1;

            fputc(code >> 4, output);
        }
    }

Can you spot the error, please? I'll be grateful!

sana
  • 410
  • 2
  • 6
  • 24
  • 1
    Is this a guessing game? Where's waldo? What error? please describe the problem... – old_timer May 18 '17 at 02:45
  • The size of compressed file is larger than the size of uncompressed file. The algorithm is supposed to use reduced size in terms of bits for compressed version. There is a logical error, which I am unable to point. It's a request if you could point out the logical error. – sana May 18 '17 at 03:02
  • 2
    not all datasets will compress with a particular algorithm, do you have reference code that works to confirm this data should compress? – old_timer May 18 '17 at 03:07
  • Why 4 in `leftoverBits << 4`? `fputc(previousCode, output); fputc(code, output);` writes 16 -bits. it should only write 9,10,11... IMO `writeBinary()` and ()v) are totally amiss and not salvageable if code is truly attempting LZW. – chux - Reinstate Monica May 18 '17 at 03:20
  • Yes, for example the string 'TOBEORNOTTOBEORTOBEORNOT#' - I have saved it in a .txt file which takes 25 bytes. After compression, the compressed file take 27bytes, but it should reduce the size up to 22% of the original file size. – sana May 18 '17 at 03:21
  • @chux Because I assume that the maximum code length takes 12 bits in dictionary. So, after taking 8 bits, 4 bits are remaining. So, before taking new bits, first 4 of the previous one are read (shifted). – sana May 18 '17 at 03:24
  • LZW uses [variable-width](https://en.wikipedia.org/wiki/Lempel–Ziv–Welch#Algorithm), not fixed 12. – chux - Reinstate Monica May 18 '17 at 03:28
  • @chux If I change leftoverBits << 4 to leftoverBits << 8, does't make any difference. – sana May 18 '17 at 03:37
  • Try working through https://ericlippert.com/2014/03/05/how-to-debug-small-programs/ – Yunnosch May 18 '17 at 05:13
  • What is your reference (not "should reduce the size up to 22%") for the compression achieved by LZW for your sample input? As @old_timer mentioned, there is no algorithm which will achieve compression on ALL input data. Admittedly, your sample input very much looks like it should be well compressable. – Yunnosch May 18 '17 at 05:17
  • Please provide a [mcve], if your problem is only runtime behaviour (not compile time), then please make it easy for us to participate. – Yunnosch May 18 '17 at 05:18
  • @Yunnosch https://en.m.wikipedia.org/wiki/Lempel%E2%80%93Ziv%E2%80%93Welch – sana May 18 '17 at 05:34
  • Very good, now edit that as useful information into your question. Then start your debugger and follow your code step by step on the way to failure, comparing the detailed information in your link. Then edit your question to describe where exatcly in your code and where exactly in the sample input it leaves the track and how. Then you will soon find help with finding the error - should it still be needed at that point. If not do not forget to write an answer here. It probably makes an interesting Q/A pair (no sarcasm here). – Yunnosch May 18 '17 at 06:33
  • if you have reference code it will also tell you what the answer is supposed to be, either add code as you go to confirm it is working on a byte by byte basis or after compressing find out how deep you got before it failed, you only need to have enough input data to hit that fail point and now have a marker to tell you when to look helping to find the problem. If you dont have reference code then you are making up your own compression? – old_timer May 18 '17 at 13:27
  • note that your problem might not be algorithm it might be data management, you might have a pointer slip and overwrite something or be saving each block of output twice instead of once, this should show patterns, esp if you have reference code that produces the desired output, starting at the fail point see if the pattern at the fail point exists at some other offset either in your output or the reference code output. – old_timer May 18 '17 at 13:29

1 Answers1

0

chux already pointed you to the solution: You need to start out with 9-bit codes, and increase the code size up to 12 whenever the available codes for the current bit size are exhausted. If you're writing 12-bit codes from the beginning, there's no compression effect, of course.

SBS
  • 806
  • 5
  • 13