0

I am new to Java and have been using it with Esper CEP engine. This question is however unrelated to Esper, its more of a Java question.

First, my class :-

import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map;

import org.apache.commons.math3.stat.descriptive.DescriptiveStatistics;

import com.espertech.esper.epl.agg.AggregationSupport;
import com.espertech.esper.epl.agg.AggregationValidationContext;

public class CustomPercentiles extends AggregationSupport {
    private List<Double> numbers = new ArrayList<Double>();

    public CustomPercentiles(){
        super();
    }

    public void clear() {
        numbers.clear();
    }

    public void enter(Object arg0) {
        Double value = (Double) (double) (Integer) arg0;
        if (value > 0){
            //Not interested in < 1
            numbers.add(value);         
        }
    }

    public void leave(Object arg0) {
        Double value = (Double) (double) (Integer) arg0;
        if (value > 0){
            //Not interested in < 1
            numbers.remove(value);          
        }
    }

    public Object getValue() {
        DescriptiveStatistics stats = new DescriptiveStatistics();
        Map<String, Integer> result = new HashMap<String, Integer>();
        for (Double number:numbers.subList(0, numbers.size())){
            stats.addValue(number);     
        }
        result.put("median", (int) stats.getPercentile(50));
        result.put("pct90", (int) stats.getPercentile(90));
        result.put("pct10", (int) stats.getPercentile(10));
        result.put("mean", (int) stats.getMean());
        result.put("std", (int) stats.getStandardDeviation());

        return result ;
    }

    public Class getValueType() {
        return Object.class;
    }

    @Override
    public void validate(AggregationValidationContext arg0) {
        // TODO Auto-generated method stub
    }

}

Basically, Esper will call enter(value) and leave(value) whenever it wants based on logic irrelevant here. And it calls getValue() to get the results computed.

Since I want to calculate percentiles, I need all the numbers available to process this. To do this, I store it in a global list called numbers, and in getValue() I put all the numbers into a DescriptiveStatistics instance and then process the stats I need.

My presumption is that each time i put the list as a new DescriptiveStatistics object, it needs to do sorting. Is there some way i can maintain a DescriptiveStatistics-like object as my global object?

The only reason i use ArrayList vs DescriptiveStatistics as my global object is that DescriptiveStatistics does not have a remove method. I.e. i cannot remove an object by value.

In practice, there are hundreds of instances of this class running at any given time, and getValue() for each of them is called every 1 to 10 second. I don't have any performance issues at the moment, but am looking for some optimization help to avoid future problems.

Alternate explanation :-

What i am doing here is mantaining a list of numbers. Esper will call the enter() and leave() methods many times to tell me what numbers should remain in the list. This in my case is a time based aggregation. Ive told esper that I want to compute based on numbers from last 1 minute.

So on 00:00:00 esper calls enter(10)
my numbers becomes [10]
So on 00:00:05 esper calls enter(15)
my numbers becomes [10, 15]
So on 00:00:55 esper calls enter(10)
my numbers becomes [10, 15, 10]
So on 00:01:00 esper calls leave(10)
my numbers becomes [15, 10]
So on 00:01:05 esper calls leave(15)
my numbers becomes [15]

Now in this duration getValue() may have been called numerous times. Each time it is called, it is expected to return calculations based off the current contents of numbers.

getValue() calculates the 10th, 50th and 90th percentiles. In order to calculate percentiles, DescriptiveStatistics needs to sort the numbers. (10th percentile of 100 numbers would be the 10th number of the list after sorting it.).

So im looking for a way to be able to take out any arbitary number from DescriptiveStatistics instance. Or asking for recommendation for some other library that can give me medians and percentiles while having the ability to take out a number from the list while knowing the value.

DescriptiveStatistics has a removeMostRecentValue(), but thats not what I want to do.

sajal
  • 776
  • 1
  • 6
  • 23
  • 1
    As far as I know, there's no good reason for casting an object more than once. To my knowledge, `Double value = (Double) (double) (Integer) arg0;` should be `Double value = (Double) arg0;`. Also, comments in code should generally answer _why_ you do something, not _what_. As for your actual question, I'm afraid I can't wrap my head around what you're actually asking for, sorry. Could you try specifying your problem more? – Aske B. Aug 29 '12 at 14:34
  • Why do your `enter()`- and `leave()`-methods take an `Object` as a parameter, when the only allowed value is really a `double`? – Aske B. Aug 29 '12 at 14:40
  • And why do you do `numbers.subList(0, numbers.size())`? It produces the same result as just putting in `numbers`, except it calls an unnecessary method. – Aske B. Aug 29 '12 at 15:06
  • Okay, I've read your question through many times now... But what is your question exactly? I don't see it anywhere. – Aske B. Aug 29 '12 at 15:09
  • Okay I read about the [DescriptiveStatistics](http://www.koders.com/java/fid43647D11FB2F898044E290450F139DC892BA0D19.aspx?s=IsNa)-class, it didn't really help me. You're saying you have an issue with sorting or something? And yes, you can add any number of field variables to your class. Can you specify what you're trying to achieve? – Aske B. Aug 29 '12 at 15:27

1 Answers1

0

To my understanding, you're asking for a way to use the DescriptiveStatistics-class as the list, instead of "numbers". Meaning, you want to dynamically add and remove numbers from the DescriptiveStatistics-variable.

As far as I can see, there's no better way to do this than what you're doing now.

Are you sure that you need the feature to remove a specific number from the list, before calculating the percentile again? Wouldn't it always be new numbers?

It sounds a bit like you would want to learn some more basics of Java.

Anyway, since I can't really give you a qualified answer to your question, I figured I would at least help you with correcting some of your code, to follow better practices:

public class CustomPercentiles extends AggregationSupport {
    private List<Double> numbers = new ArrayList<Double>();

    //Methods that are inherited from super-classes and interfaces
    //should have the "@Override" annotation,
    //both for the compiler to check if it really is inherited,
    //but also to make it more clear which methods are new in this class.
    @Override    
    public void clear() {
        numbers.clear();
    }

    @Override
    public void enter(Object value) {
        double v = (double) value;
        if (v > 0){
            numbers.add(v);            
        }
    }

    @Override
    public void leave(Object value) {
        double v = (double) value;
        if (v > 0){
            numbers.remove(v);            
        }
    }

    @Override
    public Object getValues() {
        DescriptiveStatistics stats = new DescriptiveStatistics();
        Map<String, Integer> result = new HashMap<String, Integer>();
        //It is unnecessary to call number.subList(0, numbers.size())
        //since it will just return the entire list.
        for (Double number : numbers){
            stats.addValue(number);        
        }
        result.put("median", (int) stats.getPercentile(50));
        result.put("pct90", (int) stats.getPercentile(90));
        result.put("pct10", (int) stats.getPercentile(10));
        result.put("mean", (int) stats.getMean());
        result.put("std", (int) stats.getStandardDeviation());

        return result ;
    }

    //Judgning from the API of AggregationSupport,
    //I would say this method should return Double.class
    //(it basically seems like a bad way of implementing generics).
    //Are you sure it should return Object.class?
    public Class getValueType() {
        return Object.class;
    }

    @Override
    public void validate(AggregationValidationContext arg0) {
        // TODO Auto-generated method stub
    }

}
Aske B.
  • 6,419
  • 8
  • 35
  • 62
  • The AggregationSupport class in the Esper API specifies the signature of enter, leave and getValues; the OP is just following the spec in that regard – Mike Tunnicliffe Aug 29 '12 at 15:59
  • @fd. Right, I edited my answer. To me, Esper seems like a big load of bad practices. However, I'm not familiar with percentile. [Looking it up](http://en.wikipedia.org/wiki/Percentile), it seems a bit complicated so unfortunately I can't really give good advice on an alternative. – Aske B. Aug 29 '12 at 16:18