Questions tagged [entropy]

Entropy is a measure of the uncertainty in a random variable.

The term usually refers to the Shannon entropy, which quantifies the expected value of the information contained in a message. Entropy is typically measured in bits, nats, or bans. Shannon entropy is the average unpredictability in a random variable, which is equivalent to its information content.

596 questions
6
votes
3 answers

how to calculate entropy from np histogram

I have an example of a histogram with: mu1 = 10, sigma1 = 10 s1 = np.random.normal(mu1, sigma1, 100000) and calculated hist1 = np.histogram(s1, bins=50, range=(-10,10), density=True) for i in hist1[0]: ent = -sum(i * log(abs(i))) print…
Vinci
  • 365
  • 1
  • 6
  • 16
6
votes
2 answers

How to calculate clustering entropy? A working example or software code

I would like to calculate entropy of this example scheme http://nlp.stanford.edu/IR-book/html/htmledition/evaluation-of-clustering-1.html Can anybody please explain step by step with real values? I know there are unliminted number of formulas but i…
Furkan Gözükara
  • 22,964
  • 77
  • 205
  • 342
6
votes
4 answers

How can I determine the statistical randomness of a binary string?

How can I determine the statistical randomness of a binary string? Ergo, how can I code my own test, and return a single value that corresponds to the statistical randomness, a value between 0 and 1.0 (0 being not random, 1.0 being random)? The test…
Tim
  • 997
  • 2
  • 11
  • 17
6
votes
2 answers

Are decision trees trying to maximize information gain or entropy?

I understand that decision trees try to put classifiers with high entropy high on the decision tree. However, how does information gain play into this? Information gain is defined as: InformationGain = EntropyBefore - EntropyAfter Does a decision…
5
votes
9 answers

How to generate a number in arbitrary range using random()={0..1} preserving uniformness and density?

Generate a random number in range [x..y] where x and y are any arbitrary floating point numbers. Use function random(), which returns a random floating point number in range [0..1] from P uniformly distributed numbers (call it "density"). Uniform…
psihodelia
  • 29,566
  • 35
  • 108
  • 157
5
votes
3 answers

Optimal way to compress 60 bit string

Given 15 random hexadecimal numbers (60 bits) where there is always at least 1 duplicate in every 20 bit run (5 hexdecimals). What is the optimal way to compress the bytes? Here are some examples: 01230 45647 789AA D8D9F 8AAAF 21052 20D22 8CC56…
ParoX
  • 5,685
  • 23
  • 81
  • 152
5
votes
1 answer

How to calculate the log2 of integer in C as precisely as possible with bitwise operations

I need to calculate the entropy and due to the limitations of my system I need to use restricted C features (no loops, no floating point support) and I need as much precision as possible. From here I figure out how to estimate the floor log2 of an…
ascub
  • 95
  • 1
  • 11
5
votes
2 answers

Get, or calculate the entropy of an image with Ruby and imagemagick

How to find the "entropy" with imagemagick, preferably mini_magic, in Ruby? I need this as part of a larger project, finding "interestingness" in an image so to crop it. I found a good example in Python/Django, which gives the following…
berkes
  • 26,996
  • 27
  • 115
  • 206
5
votes
1 answer

binning data via DecisionTreeClassifier sklearn?

suppose I have a data set: X y 20 0 22 0 24 1 27 0 30 1 40 1 20 0 ... I try to discretize X into few bins by minimizing the entropy. so I did the following: clf =…
user6396
  • 1,832
  • 6
  • 23
  • 38
5
votes
1 answer

Can Kullback-Leibler be applied to compare two images?

I know that KL is not a metric, and cannot be considered one. However, is it possible to use KL to measure how one image varies from another? I am trying to make an intuitive sense out of this. Thanks in advance for all responses.
troymyname00
  • 670
  • 1
  • 14
  • 32
5
votes
1 answer

Why is the entropy of a uniform distribution lower than repeated values in R?

According to Wikipedia, the uniform distribution is the "maximum entropy probability distribution". Thus, if I have two sequences (one uniformly distributed and one with repeated values), both of length k, then I would expect the entropy of the…
Alpha Bravo
  • 170
  • 12
5
votes
5 answers

Random 256bit key using SecRandomCopyBytes( ) in iOS

I have been using UUIDString as an encrption key for the files stored on my iPAD, but the security review done on my app by a third party suggested the following. With the launch of the application, a global database key is generated and stored in…
Ankit Srivastava
  • 12,347
  • 11
  • 63
  • 115
5
votes
1 answer

n-gram markov chain transition table

I'm trying to build an n-gram markov model from a given piece of text, and then access the transition table for it so I can calculate the conditional entropy for each sequence of words of length n (the grams). For example, in a 2-gram model, after…
Dan
  • 105
  • 1
  • 5
5
votes
1 answer

Tsallis entropy for continuous variable in R

Tsallis entropy for discrete variable is defined by: H[p,q] = 1/(q-1) * (1 - sum(p^q)) Tsallis entropy for continous variable is defined by: H[p,q] = 1/(q-1) * (1 - int((p(x)^q dx) where p(x) is the Probability Density Function of data, and int is…
Tommaso
  • 527
  • 1
  • 5
  • 17
5
votes
1 answer

Need faster python code for calculating sample entropy

This is the problem I have faced when I am writing python code for sample entropy. map(max, abs(a[i]-a) ) is very slow. Is there any other function perform better than map() ? Where a is ndarray that looks like np.array([…
Xin Li
  • 61
  • 5