4

I am trying to compress a grayscale image using Huffman coding in MATLAB, and have tried the following code.

I have used a grayscale image with size 512x512 in tif format. My problem is that the size of the compressed image (length of the compressed codeword) is getting bigger than the size of the uncompressed image. The compression ratio is getting less than 1.

clc;
clear all;
A1 = imread('fig1.tif');
[M N]=size(A1);
A = A1(:);
count = [0:1:255]; % Distinct data symbols appearing in sig
total=sum(count);
for i=1:1:size((count)');                  
    p(i)=count(i)/total;
end

[dict,avglen]=huffmandict(count,p) % build the Huffman dictionary
comp= huffmanenco(A,dict);         %encode your original image with the dictionary you just built
compression_ratio= (512*512*8)/length(comp)   %computing the compression ratio

%% DECODING
Im = huffmandeco(comp,dict); % Decode the code
I11=uint8(Im);

decomp=reshape(I11,M,N);
imshow(decomp);
rayryeng
  • 102,964
  • 22
  • 184
  • 193
parvathy
  • 47
  • 1
  • 1
  • 6

1 Answers1

4

There is a slight error in your code. I'm assuming you want to calculate the probability of encountering each pixel, which is the normalized histogram. You're not computing it properly. Specifically:

count = [0:1:255]; % Distinct data symbols appearing in sig
total=sum(count);
for i=1:1:size((count)');                  
    p(i)=count(i)/total;
end

total is summing over [0,255] which is not correct. You're supposed to compute the probability distribution of your image. You should use imhist for that instead. As such, you should do this instead:

count = 0:255;
p = imhist(A1) / numel(A1);

This will correctly calculate your probability distribution for your image. Remember, when you're doing Huffman coding, you need to specify the probability of encountering a pixel. Assuming that each pixel can equally be likely to be chosen, this is captured by calculating the image's histogram, then normalizing by the total number of pixels in your image. Try that and see if you get any better results.


However, Huffman will only give you good compression ratios if you have frequently occurring symbols. Did you happen to take a look at the histogram or the spread of your pixels in your image?

If the spread is quite large, with very few entries per bin, then Huffman will not give you any compression savings. In fact it may give you a larger size as a result. Bear in mind that the TIFF compression standard only uses Huffman as part of the algorithm. There is also some pre- and post-processing done to further drive down the size.

As a further example, suppose I had an image that consisted of [0, 1, 2, ... 255; 0, 1, 2, ..., 255; 0, 1, 2, ..., 255]; I have 3 rows of [0,255], but really it could be any number of rows. This means that the probability of encountering each symbol is equiprobable, or 1/255, which means that for each symbol, we would need 8 bits per symbol... which is essentially the raw pixel value anyway!

The key behind Huffman is that a group of bits together generate one symbol. Frequently occurring symbols get assigned a smaller sequence of bits. Because this particular image that I talked about has intensities that are equiprobable, then you'd only generate one symbol per intensity rather than a group. With this, not only will you transmit the dictionary, you would effectively be sending one character at a time, and this is no better than sending the raw byte stream.

If you want your image to be compressed by raw Huffman, the distribution of pixels has to be skewed. For example, if most of the intensities in your image are dark, or are bright. If your image has good contrast or if the spread of the pixel intensities is flat throughout the image, then Huffman will not give you any compression savings.

rayryeng
  • 102,964
  • 22
  • 184
  • 193
  • I tried to run the code. However, when I run as like given in the question i.e. count = [0:1:255]; % Distinct data symbols appearing in sig total=sum(count); for i=1:1:size((count)'); p(i)=count(i)/total; end the code never ends. While I tried to run as like the proposed one in the answer, I get an error: Error using huffmandict (line 171) Source symbols repeat Error in Huffman (line 10) [dict,avglen]=huffmandict(count,p) The only difference is: I take an RGB image and convert it to grayscale. Can you help me please? – faith Apr 27 '18 at 19:45
  • @faith `count` is not being computed properly when you reference the original question. I specifically point that out in my answer and advised the OP to use `imhist` instead. If you don't have `imhist` as it is part of the image processing toolbox, use `accumarray`: `p = accumarray(double(A(:)) + 1, 1, [256 1]) / numel(A);`. `A` is the input image. – rayryeng Apr 27 '18 at 20:01
  • thanks for rapid response. I already tried to run as you advised. I have `imhist` as well. However, I got the error `source symbols repeat`. – faith Apr 27 '18 at 20:12
  • @faith something is wrong with your `count` variable. There are duplicates. Double check that you correctly generated a vector from 0 to 255 in steps of 1 – rayryeng Apr 27 '18 at 21:48
  • well I think I miss something. You advised to generate `count` variable as `count=imhist(A1)` instead of `count=[0:1:255]`. So where should I use a vector `[0:255]`? – faith Apr 29 '18 at 07:20
  • 1
    @faith I was mistaken. `count` should be left the same. It's the `p` that is incorrect which I've corrected in my post. Please see the edits. Good luck. – rayryeng Apr 29 '18 at 08:44