3

I am trying to use apache commons math for kernel density estimation for a group of values. One bin happens to have only one value, and when I try to call cumulativeProbability() I get a NotStrictlyPositiveException. Is there any way to prevents this? I can't be sure that all the bins will have at least one value.

Thanks.

barisdad
  • 515
  • 7
  • 19

2 Answers2

0

Given that this bug is still there, I wrote my own implementation of the EmpiricalDistribution class, following their guidelines. I only re-implemented the functionality that I needed, i.e. computing the entropy of a distribution, but you can easily extend it to your needs.

public class EmpiricalDistribution {

    private double[] values;
    private int[] binCountArray;
    private double maxValue, minValue;
    private double mean, stDev;

    public EmpiricalDistribution(double[] values) {
        this.values = values;
        int binCount = NumberUtil.roundToClosestInt(values.length / 10.0);
        binCountArray = new int[binCount];

        maxValue = Double.NEGATIVE_INFINITY;
        minValue = Double.POSITIVE_INFINITY;
        for (double value : values) {
            if (value > maxValue) maxValue = value;
            if (value < minValue) minValue = value;
        }

        double binRange = (maxValue - minValue) / binCount;
        for (double value : values) {
            int bin = (int) ((value - minValue) / binRange);
            bin = Math.min(binCountArray.length - 1, bin);
            binCountArray[bin]++;
        }

        mean = (new Mean()).evaluate(values);
        stDev = (new StandardDeviation()).evaluate(values, mean);
    }

    public double getEntropy() {
        double entropy = 0;
        for (int valuesInBin : binCountArray) {
            if (valuesInBin == 0) continue;

            double binProbability = valuesInBin / (double) values.length;
            entropy -= binProbability * FastMath.log(2, binProbability);
        }

        return entropy;
    }

    public double getMean() {
        return mean;
    }

    public double getStandardDeviation() {
        return stDev;
    }

}
Alphaaa
  • 4,206
  • 8
  • 34
  • 43
0

I get the same error with one of my distributions.

Reading the Javadoc of this class, it says the following:

USAGE NOTES:
The binCount is set by default to 1000.  A good rule of thumb
is to set the bin count to approximately the length of the input 
file divided by 10.

I've initialised my EmpiricalDistribution with a binCount equals to 10% of my initial data length and now everything is working ok:

double[] baseLine = getBaseLineValues();
...
// Initialise binCount
distribution = new EmpiricalDistribution(baseLine.length/10);
// Load base line data
distribution.load(baseLine);
// Now you can obtain random values based on this distribution
double randomValue = distribution.getNextValue();
jfcorugedo
  • 9,793
  • 8
  • 39
  • 47