What's the correct way of calculate the entropy of a variable?

Question

Example:

If a have a variable X=[1 2 2 0], what's the correct way of calculating the entropy?

My attempt (using MATLAB):

p(1) = 1/4; % probably of occur 1

p(2) = 2/4; % probably of occur 2

p(0) = 1/4; % probably of occur 0

H = -(1/4*log2(1/4)+2/4*log2(2/4)+1/4*log2(1/4))
  = 1.5

The problem and my confusion is, should I consider the zero values of X? Using the entropy function of MATLAB I get the same value.

Thank you.

The [documentation for `entropy`](http://uk.mathworks.com/help/images/ref/entropy.html) tells you exactly how it's calculated. If you are getting the same value as a MATLAB in-built (which you consider to be correct?), then what are you asking? — Wolfie, Jan 19 '18 at 15:16
Probably you missed it, but I ask again, should I consider the zero values of X? — , Jan 19 '18 at 15:21
Probably you missed it, but I ask again, what are you asking? You have a method, it returns some value. Is this value correct? Do you not know whether this value is correct? Are you asking whether the Matlab function `entropy` gives the correct value (since it's the same as your value)? What is the mathematical equation which you are trying to convert to a function? What are `X` or `p` doing in your example, as they are not used? **You need to describe a specific problem, with your expected and current results to demonstrate what the issue is** — Wolfie, Jan 19 '18 at 15:32
I thought using the tags (such as entropy), I was filtering...Apparently not. — , Jan 19 '18 at 15:37
There are only 25 followers of the `entropy` tag and 93 for `information-extraction` (compared to, say, the 40k for `matlab`) meaning you are unlikely to find a specialist! You need to describe what you're trying to achieve or we can't help you, simple as that. — Wolfie, Jan 19 '18 at 17:12
Don’t use the way I put my question and the number of followers to justify your lack of knowledge on the main topic (entropy). — , Jan 19 '18 at 21:45
@Dirac I think you'll find [CrossValidated](https://stats.stackexchange.com/) has the expertise you're looking for if you're not happy here. — SecretAgentMan, Nov 14 '18 at 14:40
Tell me about it! ;) @SecretAgentMan thank you for the insight! — , Nov 14 '18 at 14:46

Tommaso Belluzzo · Accepted Answer · 2018-05-16T07:43:02.193

The answer to your question depends on what you are attempting to do.

If X represents the data associated to a greyscale image, then the entropy function is what you are looking for:

X = [1 2 2 0];
H = entropy(X); % 0.811278124459133

But neither your X variable, nor your expected result (1.5) point to that solution. To me, it seems like you are just attempting to calculate the Shannon's entropy on a vector of random values. Hence, you must use a different approach:

X = [1 2 2 0];

% Build the probabilities vector according to X...

X_uni = unique(X);
X_uni_size = numel(X_uni);

P = zeros(X_uni_size,1);

for i = 1:X_uni_size
    P(i) = sum(X == X_uni(i));
end

P = P ./ numel(X);

% Compute the Shannon's Entropy

H = -sum(P .* log2(P)); % 1.5

P must sum to 1 and probabilities (not values) equal to zero must be excluded to the computation (using the code above , it's not possible to produce such probabilities so it's not necessary to handle them).

Why are the results different? That's very simple to explain. In the first example (the one that uses the entropy function), Matlab is forced to treat X as a grayscale image (a matrix whose values are either between 0 and 1 or ranging from 0 to 255). Since the underlying type of X is double, the variable is internally transformed by the function im2uint8 so that all his values fall within the correct range of a greyscale image... thus obtaining:

X = [255 255 255 0];

This produces a different vector of probabilities, equal to:

P = [0.25 0.75];

that produces a Shannon's entropy index equal to 0.811278124459133.

What's the correct way of calculate the entropy of a variable?

1 Answers1