groupingBy operation in Java-8

Question

I'm trying to re-write famous example of Spark's text classification (http://chimpler.wordpress.com/2014/06/11/classifiying-documents-using-naive-bayes-on-apache-spark-mllib/) on Java 8.

I have a problem - in this code I'm making some data preparations for getting idfs of all words in all files:

    termDocsRdd.collect().stream().flatMap(doc -> doc.getTerms().stream()
                                .map(term -> new ImmutableMap.Builder<String, String>()
                                .put(doc.getName(),term)
                                .build())).distinct()

And I'm stuck on the groupBy operation. (I need to group this by term, so each term must be a key and the value must be a sequence of documents). In Scala this operation looks very simple - .groupBy(_._2). But how can I do this in Java?

I tried to write something like:

    .groupingBy(term -> term, mapping((Document) d -> d.getDocNameContainsTerm(term), toList()));

but it's incorrect...

Somebody knows how to write it in Java?

Thank You very much.

score 2 · Accepted Answer · edited Oct 23 '14 at 17:33

2

If I understand you correctly, you want to do something like this:

(import static java.util.stream.Collectors.*;)

Map<Term, Set<Document>> collect = termDocsRdd.collect().stream().flatMap(
 doc -> doc.getTerms().stream().map(term -> new AbstractMap.SimpleEntry<>(doc, term)))
.collect(groupingBy(Map.Entry::getValue, mapping(Map.Entry::getKey, toSet())));

The use of Map.Entry/ AbstractMap.SimpleEntry is due to the absence of a standard Pair<K,V> class in Java-8. Map.Entry implementations can fulfill this role but at the cost of having unintuitive and verbose type and method names (regarding the task of serving as Pair implementation).

If you are using the current Eclipse version (I tested with LunaSR1 20140925) with its limited type inference, you have to help the compiler a little bit:

Map<Term, Set<Document>> collect = termDocsRdd.collect().stream().flatMap(
 doc -> doc.getTerms().stream().<Map.Entry<Document,Term>>map(term -> new AbstractMap.SimpleEntry<>(doc, term)))
.collect(groupingBy(Map.Entry::getValue, mapping(Map.Entry::getKey, toSet())));

edited Oct 23 '14 at 17:33

glglgl

89,107
13
149
217

answered Oct 15 '14 at 11:21

Holger

285,553
42
434
765

Thanks Man for answer! But when I try to use Your code in Eclipse - I got compile error - **Type mismatch: cannot convert from** `Map – dimson Oct 16 '14 at 04:07
I’m afraid, Eclipse’s Java-8 support still needs more time to mature. – Holger Oct 16 '14 at 07:39
Thanks Man! I'll try another IDE. – dimson Oct 16 '14 at 07:43

groupingBy operation in Java-8

1 Answers1