I am new to Java and have been using it with Esper CEP engine. This question is however unrelated to Esper, its more of a Java question.
First, my class :-
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import org.apache.commons.math3.stat.descriptive.DescriptiveStatistics;
import com.espertech.esper.epl.agg.AggregationSupport;
import com.espertech.esper.epl.agg.AggregationValidationContext;
public class CustomPercentiles extends AggregationSupport {
private List<Double> numbers = new ArrayList<Double>();
public CustomPercentiles(){
super();
}
public void clear() {
numbers.clear();
}
public void enter(Object arg0) {
Double value = (Double) (double) (Integer) arg0;
if (value > 0){
//Not interested in < 1
numbers.add(value);
}
}
public void leave(Object arg0) {
Double value = (Double) (double) (Integer) arg0;
if (value > 0){
//Not interested in < 1
numbers.remove(value);
}
}
public Object getValue() {
DescriptiveStatistics stats = new DescriptiveStatistics();
Map<String, Integer> result = new HashMap<String, Integer>();
for (Double number:numbers.subList(0, numbers.size())){
stats.addValue(number);
}
result.put("median", (int) stats.getPercentile(50));
result.put("pct90", (int) stats.getPercentile(90));
result.put("pct10", (int) stats.getPercentile(10));
result.put("mean", (int) stats.getMean());
result.put("std", (int) stats.getStandardDeviation());
return result ;
}
public Class getValueType() {
return Object.class;
}
@Override
public void validate(AggregationValidationContext arg0) {
// TODO Auto-generated method stub
}
}
Basically, Esper will call enter(value) and leave(value) whenever it wants based on logic irrelevant here. And it calls getValue() to get the results computed.
Since I want to calculate percentiles, I need all the numbers available to process this. To do this, I store it in a global list called numbers, and in getValue() I put all the numbers into a DescriptiveStatistics instance and then process the stats I need.
My presumption is that each time i put the list as a new DescriptiveStatistics object, it needs to do sorting. Is there some way i can maintain a DescriptiveStatistics-like object as my global object?
The only reason i use ArrayList vs DescriptiveStatistics as my global object is that DescriptiveStatistics does not have a remove method. I.e. i cannot remove an object by value.
In practice, there are hundreds of instances of this class running at any given time, and getValue() for each of them is called every 1 to 10 second. I don't have any performance issues at the moment, but am looking for some optimization help to avoid future problems.
Alternate explanation :-
What i am doing here is mantaining a list of numbers. Esper will call the enter() and leave() methods many times to tell me what numbers should remain in the list. This in my case is a time based aggregation. Ive told esper that I want to compute based on numbers from last 1 minute.
So on 00:00:00 esper calls enter(10)
my numbers becomes [10]
So on 00:00:05 esper calls enter(15)
my numbers becomes [10, 15]
So on 00:00:55 esper calls enter(10)
my numbers becomes [10, 15, 10]
So on 00:01:00 esper calls leave(10)
my numbers becomes [15, 10]
So on 00:01:05 esper calls leave(15)
my numbers becomes [15]
Now in this duration getValue() may have been called numerous times. Each time it is called, it is expected to return calculations based off the current contents of numbers.
getValue() calculates the 10th, 50th and 90th percentiles. In order to calculate percentiles, DescriptiveStatistics needs to sort the numbers. (10th percentile of 100 numbers would be the 10th number of the list after sorting it.).
So im looking for a way to be able to take out any arbitary number from DescriptiveStatistics instance. Or asking for recommendation for some other library that can give me medians and percentiles while having the ability to take out a number from the list while knowing the value.
DescriptiveStatistics has a removeMostRecentValue(), but thats not what I want to do.