0

I was wondering if anyone more experienced than me could have an idea to improve the efficiency of getting basic statistics of an array(Max, mean,std dev,median,min). I have came up with something in order of O(nlog(1 sort)+2n(2 loops for the mean calculation)).

  • My Objective is to extract the summary statistics of many arrays.

Below is what I have so far:

void testGetSummaryStatisticsSpeed() {
    int numVals=1000;
    double[] ar=new double[numVals];
    int numCalculations=2*1000*1*1000;
    Instant start = Instant.now();
    for(int i=0;i<numCalculations;i++){
        for(int k=0;k<numVals;k++){
            ar[k]=Math.random();//To simulate the actual function of my
            // use case
        }
        double[] stats=getSummaryStatistics(ar);
    }
    Instant end = Instant.now();
    long totalTime = Duration.between(start, end).toSeconds();
    System.out.println("Time (s)" + totalTime);
    assertTrue(totalTime<7*1.2);
}


 public static double[] getSummaryStatistics(double[] a) {
    double[] summary = new double[5];
    if (a.length == 0) {
        throw new IllegalArgumentException("Array is empty, please " + "verify" + " the values.");
    }else if(a.length == 1){
        summary[0] = a[0];
        summary[1] = a[0];
        summary[2] = 0;
        summary[3] = a[0];
        summary[4] = a[0];
    }else {
        double[] meandStd = calcMeanSDSample(a);
        summary[1] = meandStd[0];
        summary[2] = meandStd[1];
        double[] maxMinMedian = calcMaxMinMedian(a);
        summary[0] = maxMinMedian[0];
        summary[4] = maxMinMedian[1];
        summary[3] = maxMinMedian[2];
    }
    return summary;
}


public static double[] calcMeanSDSample(double numArray[]) {
    int length = numArray.length;
    double[] meanStd = new double[2];
    if (length == 0) {
        throw new IllegalArgumentException("Array is empty, please " + "verify" + " the values.");
    } else if (length == 1) {
        meanStd[0] = numArray[0];
        meanStd[1] = 0.0;
    } else {
        double sum = 0.0, standardDeviation = 0.0;

        for (double num : numArray) {
            sum += num;
        }

        meanStd[0] = sum / length;

        for (double num : numArray) {
            standardDeviation += Math.pow(num - meanStd[0], 2);
        }
        meanStd[1] = Math.sqrt(standardDeviation / ((double) length - 1.0));//-1
        // because it is
        // for sample
    }
    return meanStd;



public static double[] calcMaxMinMedian(double[] a) {
    double[] maxMinMedian = new double[3];
    if (a.length == 0) {
        throw new IllegalArgumentException("Array is empty, please " + "verify" + " the values.");
    } else if (a.length == 1) {
        for (int i = 0; i < 3; i++) {
            maxMinMedian[i] = a[0];
        }
    } else {
        Arrays.sort(a);
        maxMinMedian[0] = a[a.length - 1];
        maxMinMedian[1] = a[0];
        maxMinMedian[2] = (a.length % 2 != 0) ? (double) (a[a.length / 2]) : (double) ((a[(a.length - 1) / 2] + a[a.length / 2]) / 2.0);
    }
    return maxMinMedian;
}

I was thinking of perhaps trying to calculate the stats within the loop for(int k=0;k<numVals;k++)in testGetSummaryStatisticsSpeed(), but the maximum I would be able to reach is O(nlogn), (by addying the mean calculation directly there). However I am note sure there is something else I could try, that you pros advise me.

Thank you in advance

DTK
  • 95
  • 1
  • 9
  • 2
    You’re going to need to sort for median - no avoiding that. You don’t need two passes for the mean. – Boris the Spider Oct 09 '21 at 19:37
  • 4
    You can calculation min, max, mean, variance, standard deviation, skewness, and kurtosis in [a single pass](https://www.johndcook.com/blog/skewness_kurtosis/). – David Conrad Oct 09 '21 at 20:35
  • 2
    First of all, you should replace the return types with dedicated classes having meaningfully named fields, instead of double arrays. That will help readability, robustness, *and* performance. And `calcMaxMinMedian` doesn't need special handling code for array length one. – Holger Oct 10 '21 at 14:58

0 Answers0