0

I am trying to do grouping and aggregation using Hazelcast Jet but it is getting little bit slow as I have to loop through data twice one for creating groupingKey and after that aggregating all data so is there any better and feasible way to it, Please help

Here is my code first I am creating a groupingKey from using my data as grouping is done by multiple keys:

// Fields with which I want to do grouping.
List<String> fields1 = {"Field1", "Field4"};
List<String> cAggCount = {"CountFiled"};
List<String> sumField = {"SumFiled"}; 
BatchStage<Map<Object, List<Object>>> aggBatchStageDataGroupBy = batchStage
            .aggregate(AggregateOperations.groupingBy(jdbcData -> {
                
                Map<String, Object> m = ((Map<String, Object>)jdbcData);
                Set<String> jset = m.keySet();
                

                StringBuilder stringBuilder = new StringBuilder("");
                fields1.stream().forEach(dataValue -> {     
                    
                    if (!jset.contains(dataValue.toString())) {
                        
                        stringBuilder.append(null+"").append(",");
                    } else {
                        Object k = m.get(dataValue.toString());
                        
                        if (k == null) {
                            stringBuilder.append("").append(",");
                        } else {
                            stringBuilder.append(k).append(",");
                        }
                    }

                });
                return stringBuilder.substring(0, stringBuilder.length() - 1);
            }));

And after that I am doing aggregation on it as below:

BatchStage<List<Map<String, Object>>> aggBatchStageData = aggBatchStageDataGroupBy
                    .map(data -> {
                  
                  data.entrySet().stream().forEach(v -> {
                            Map<String, Object> objectMap = new HashMap<>();
                            IntStream.range(0, cAggCount).forEach(k -> {
                                    objectMap.put(countAlias.get(k), v.getValue().stream().mapToLong(dataMap -> {
                                        return new BigDecimal(1).intValue();
                                    }).count());
                                });
              }
      return mapList;
});

So can we do this whole process in one go instead of doing loop twice like groupigByKey first and than aggregating it.

user3458271
  • 638
  • 12
  • 31
  • I'm not sure what you're trying to achieve, but perhaps you confused `groupingKey` and `groupedBy`. This comment might be useful to you: https://github.com/hazelcast/hazelcast/blob/74132cefd0c6f77e63d96c666fb5864481079815/hazelcast/src/main/java/com/hazelcast/jet/aggregate/AggregateOperations.java#L991-L1015 – Oliv May 02 '22 at 10:44
  • @Oliv Thank you for the response, what I am trying to do, I have a csv with size of 200mb and there is two field on which I am doing aggregation sum and count, and grouping it by 3 other fields/column. Now what's happen csv is get read in 500ms but aggregation and grouping taking nearly 4 seconds which is too high I want to reduce it in 1 second only. And my machine is quiet powerful with 16gb ram – user3458271 May 03 '22 at 05:16
  • So your question is about how to do aggregation, or about how to speed it up. For optimization you need to send some code we can run. But not in this question, this one is about _how_ to do aggregation. – Oliv May 04 '22 at 08:16
  • Question is about how to do aggregation in more optimized and faster way with grouping. – user3458271 May 05 '22 at 04:38

0 Answers0