1
private double log(double num, int base){
   return Math.log(num)/Math.log(base);
}

public double entropy(List<String> data){

    double entropy = 0.0;
    double prob = 0.0;

    if(this.iFrequency.getKeys().length==0){
        this.setInterestedFrequency(data);
    }


    String[] keys = iFrequency.getKeys();

    for(int i=0;i<keys.length;i++){

        prob = iFrequency.getPct(keys[i]);
        entropy = entropy - prob * log(prob,2);
    }

    iFrequency.clear();
    return entropy;
}

I wrote a function that calculates the entropy of a data set. The function works fine and the math is correct. Everything would be fine if I was working with small data sets, but the problem is that I'm using this function to calculate the entropy of sets that have thousands or tens of thousands of members and my algorithm runs slowly.

Are there any algorithms other than the one that I'm using that can be used to calculate the entropy of a set? If not, are there any optimizations that I can add to my code to make it run faster?

I found this question, but they didn't really go into details.

Community
  • 1
  • 1
j.jerrod.taylor
  • 1,120
  • 1
  • 13
  • 33

1 Answers1

1

First of all, it appears that you've built an O(N^2) algorithm, in that you recompute the sum of counts on every call to getPct. I recommend two operations: (1) Sum the counts once and store the value. Compute prob manually as value[i] / sum. (2) You'll save a small amount of time if you compute entropy as the sum prob * Math.log(prob). When you're all done, divide once by Math.log(2).

Prune
  • 76,765
  • 14
  • 60
  • 81