1

We have a requirement to group by multiple fields in a dynamic way on a huge data set. The data is stored in Hazelcast Jet cluster. Example: if Person class contains 4 fields: age, name, city and country. We first need to group by city and then by country and then we may group by name based on conditional parameters.

We already tried using Distributed collection and is not working. Even when we tried using Pipeline API it is throwing error.

Code:

    IMap res= client.getMap("res"); // res is distrbuted map
    Pipeline p = Pipeline.create();
    JobConfig jobConfig = new JobConfig();
    p.drawFrom(Sources.<Person>list("inputList"))
     .aggregate(AggregateOperations.groupingBy(Person::getCountry))
     .drainTo(Sinks.map(res));      
    jobConfig = new JobConfig();
    jobConfig.addClass(Person.class);
    jobConfig.addClass(HzJetListClientPersonMultipleGroupBy.class);
    Job job = client.newJob(p, jobConfig);
    job.join();

Then we read from the map in the client and destroy it.

Error Message on the server:

Caused by: java.lang.ClassCastException: java.util.HashMap cannot be cast to java.util.Map$Entry

Oliv
  • 10,221
  • 3
  • 55
  • 76
Bharat
  • 31
  • 4

1 Answers1

4

groupingBy aggregates all the input items into a HashMap where the key is extracted using the given function. In your case it aggregates a stream of Person items into a single HashMap<String, List<Person>> item.

You need to use this:

        p.drawFrom(Sources.<Person>list("inputList"))
         .groupingKey(Person::getCountry)
         .aggregate(AggregateOperations.toList())
         .drainTo(Sinks.map(res));

This will populate the res map with a list of persons in each city.

Remember, without groupingKey() the aggregation is always global. That is, all items in the input will be aggregated to one output item.

Oliv
  • 10,221
  • 3
  • 55
  • 76