Java : How to do aggregation over a list supporting min, max, avg, last kind of aggregations in each group

Question

I have done this earlier in MySQL itself, as that seems the proper way, but I have to do some business logic calculations and then need to apply the group by on the resulting list, any suggestions to do this in Java without compromising performance (have looked at lambdaj, seems it slows down due to heavy use of proxies, haven't tried though).

List<Item> contains name, value, unixtimestamp as properties, and is returned by the database. Each record is 5 mins apart.

I should be able to group by a dynamic sample time, say 1 hour, which means have to group every 12 records into one record, and then apply min, max, avg, last on each group.

Any suggestions appreciated.

[Update] Have the below working, yet to do aggregation on each of the list elements on the indexed map value. As you see, I created a map of lists, where key is the integer representation sample time requested (30 is the sample requested here).

private List<Item> performConsolidation(List<Item> items) {
        ListMultimap<Integer, Item> groupByTimestamp = ArrayListMultimap.create();
        List<Item> consolidatedItems = new ArrayList<>();
        for (Item item : items) {
            groupByTimestamp.put((int)floor(((Double.valueOf(item.getItem()[2])) / 1000) / (60 * 30)), item);
        }
        return consolidatedItems;
    }

What have you tried? When you say "lambda", do you actually mean "lambdaj" as the tag on the question implies? Your description seems pretty straightforward to implement. — Frank Pavageau, Jul 09 '14 at 11:45
Updated the question with what I have now, thanks for your rely, yes I meant "lambdaj". — Sandeep, Jul 09 '14 at 12:03
btw, The problem is that the Item class contains an array like this. `private String[] item = new String[4];` (I don't have control on this, cannot change this), and the index 0 corresponds to item name, 1 to value, 2 to timestamp. I need to aggregate min/max/last based on the index I created above, and pick only that item from each list of each index/key. — Sandeep, Jul 09 '14 at 12:10

skarist · Answer 1 · 2014-07-09T13:27:51.920

Here is one suggestion:

public Map<Long,List<Item>> group_items(List<Item> items,long sample_period) {
  Map<Long,List<Item>> grouped_result = new HashMap<Long,List<Item>>();
  long group_key;

  for (Item item: items) {
    group_key = item.timestamp / sample_period;
    if (grouped_result.containsKey(group_key)) {  
      grouped_result.get(group_key).add(item);
    }
    else {
      grouped_result.put(group_key, new ArrayList<Item>());
      grouped_result.get(group_key).add(item);
    }
  }
  return grouped_result;
}

sample_period is the number of seconds to group by: 3600 = hour, 900 = 15 mins

The keys in the map can of course be pretty big numbers (depending on the sample period), but this grouping will preserve the internal time order of the groups, i.e. lower keys are those that come first in the time order. If we assume that the data in the original list is ordered in time order we could of course get the value of the first key and then subtract that from the keys. That way we will get keys 0, 1, etc. In that case before the for loop starts we need:

int subtract = items.get(0).timestamp / sample_period; // note since both numbers a ints/longs we have a integer division

Then inside the for loop:

group_key = items.timestamp / sample_period - subtract;

Something along these lines will work, i.e. group your dataset as you describe. Then you can apply min max avg etc to the resulting lists. But since those functions will of course have to iterate over individual group lists again it is maybe better to incorporate those calculations into this solution, and have the function return something like Map where Aggregates is a new type containing fields for avg, min, max, and then a list of the items in the group? As for performance I would think this is acceptable. This is a plain O(N) solution. Edit:

ok just want to add a more complete solution/suggestion which also calculates the min, max and avg:

public class Aggregate {
  public double avg;
  public double min;
  public double max;

  public List<Item> items = new ArrayList<Item>();

  public Aggregate(Item item) {
    min = item.value;
    max = item.value;
    avg = item.value;
    items.add(item);
  }

  public void addItem(Item item) {
    items.add(item);
    if (item.value < this.min) {
      this.min = item.value;
    }
    else if (item.value > this.max) {
      this.max = item.value;
    }
    this.avg = (this.avg * (this.items.size() - 1) + item.value) / this.items.size(); 
  }
}

public Map<Long,Aggregate> group_items(List<Item> items,long sample_period) {

  Map<Long,Aggregate> grouped_result = new HashMap<Long,Aggregate>();
  long group_key;

  long subtract = items.get(0).timestamp / sample_period;
  for (Item item: items) {
    group_key = items.timestamp / sample_period - subtract;
    if (grouped_result.containsKey(group_key)) {  
      grouped_result.get(group_key).addItem(item);
    }
    else {
      grouped_result.put(group_key, new Aggregate(item));
    }
  }
  return grouped_result;
}

that is just a rough solution. We might want to add some more properties to the aggregate etc.

Ahh... sorry was working on the answer and posted before I saw that you had updated your question. But you seem to be going in the same direction as I suggest — skarist, Jul 09 '14 at 12:10

score 0 · Answer 2 · answered Jul 16 '14 at 14:38

Setting aside the computation of min/max/etc., I note that your performConsolidation method looks like it could use Multimaps.index. Just pass it the items and a Function<Item, Integer> that computes the value you want:

return (int) floor(((Double.valueOf(item.getItem()[2])) / 1000) / (60 * 30));

That won't save a ton of code, but it may make it easier to see what's happening at a glance: index(items, timeBucketer).

aburkov · Answer 3 · 2015-06-11T02:56:25.020

0

If you can use my xpresso project you can do the following:

Let your input list be:

list<tuple> items = x.list(x.tuple("name1",1d,100),x.tuple("name2",3d,105),x.tuple("name1",4d,210));

You first unzip your list of tuple to get a tuple of lists:

tuple3<list<String>,list<Double>,list<Integer>> unzipped = x.unzip(items, String.class, Double.class, Integer.class);

Then you can aggregate the way you want:

x.print(x.tuple(x.last(unzipped.value0), x.avg(unzipped.value1), x.max(unzipped.value2)));

The preceding will produce:

(name1,2.67,210)

edited Jun 11 '15 at 02:56

answered Jun 11 '15 at 02:27

aburkov

13
2
5

Make sure to mention whenever you're recommending something you're involved in the production of; silent (or excessive) self-promotion is frowned upon as spam. 5/5 posts is pretty much "excessive" already. – Nathan Tuggy Jun 11 '15 at 02:46
Thanks I'll keep in mind. Excessive even if helpful? – aburkov Jun 11 '15 at 02:58
Yeah; SO is not real thrilled when people appear to be joining and posting chiefly to promote their particular thing rather than simply to answer questions on a decent range of subjects (and promote their thing on those occasions it's appropriate). – Nathan Tuggy Jun 11 '15 at 02:59

Java : How to do aggregation over a list supporting min, max, avg, last kind of aggregations in each group

3 Answers3