I was wondering if anyone more experienced than me could have an idea to improve the efficiency of getting basic statistics of an array(Max, mean,std dev,median,min). I have came up with something in order of O(nlog(1 sort)+2n(2 loops for the mean calculation)).
- My Objective is to extract the summary statistics of many arrays.
Below is what I have so far:
void testGetSummaryStatisticsSpeed() {
int numVals=1000;
double[] ar=new double[numVals];
int numCalculations=2*1000*1*1000;
Instant start = Instant.now();
for(int i=0;i<numCalculations;i++){
for(int k=0;k<numVals;k++){
ar[k]=Math.random();//To simulate the actual function of my
// use case
}
double[] stats=getSummaryStatistics(ar);
}
Instant end = Instant.now();
long totalTime = Duration.between(start, end).toSeconds();
System.out.println("Time (s)" + totalTime);
assertTrue(totalTime<7*1.2);
}
public static double[] getSummaryStatistics(double[] a) {
double[] summary = new double[5];
if (a.length == 0) {
throw new IllegalArgumentException("Array is empty, please " + "verify" + " the values.");
}else if(a.length == 1){
summary[0] = a[0];
summary[1] = a[0];
summary[2] = 0;
summary[3] = a[0];
summary[4] = a[0];
}else {
double[] meandStd = calcMeanSDSample(a);
summary[1] = meandStd[0];
summary[2] = meandStd[1];
double[] maxMinMedian = calcMaxMinMedian(a);
summary[0] = maxMinMedian[0];
summary[4] = maxMinMedian[1];
summary[3] = maxMinMedian[2];
}
return summary;
}
public static double[] calcMeanSDSample(double numArray[]) {
int length = numArray.length;
double[] meanStd = new double[2];
if (length == 0) {
throw new IllegalArgumentException("Array is empty, please " + "verify" + " the values.");
} else if (length == 1) {
meanStd[0] = numArray[0];
meanStd[1] = 0.0;
} else {
double sum = 0.0, standardDeviation = 0.0;
for (double num : numArray) {
sum += num;
}
meanStd[0] = sum / length;
for (double num : numArray) {
standardDeviation += Math.pow(num - meanStd[0], 2);
}
meanStd[1] = Math.sqrt(standardDeviation / ((double) length - 1.0));//-1
// because it is
// for sample
}
return meanStd;
public static double[] calcMaxMinMedian(double[] a) {
double[] maxMinMedian = new double[3];
if (a.length == 0) {
throw new IllegalArgumentException("Array is empty, please " + "verify" + " the values.");
} else if (a.length == 1) {
for (int i = 0; i < 3; i++) {
maxMinMedian[i] = a[0];
}
} else {
Arrays.sort(a);
maxMinMedian[0] = a[a.length - 1];
maxMinMedian[1] = a[0];
maxMinMedian[2] = (a.length % 2 != 0) ? (double) (a[a.length / 2]) : (double) ((a[(a.length - 1) / 2] + a[a.length / 2]) / 2.0);
}
return maxMinMedian;
}
I was thinking of perhaps trying to calculate the stats within the loop for(int k=0;k<numVals;k++)in testGetSummaryStatisticsSpeed(), but the maximum I would be able to reach is O(nlogn), (by addying the mean calculation directly there). However I am note sure there is something else I could try, that you pros advise me.
Thank you in advance