I have a class IndexEntry
which looks like this:
public class IndexEntry implements Comparable<IndexEntry>
{
private String word;
private int frequency;
private int documentId;
...
//Simple getters for all properties
public int getFrequency()
{
return frequency;
}
...
}
I am storing objects of this class in a Guava SortedSetMultimap
(which allows for multiple values per key) where I am mapping a String
word to some IndexEntry
s. Behind the scenes, it maps each word to a SortedSet<IndexEntry>
.
I am trying to implement a sort of indexed structure of words to documents and their occurrence frequencies inside the documents.
I know how to get the count of the most common word, but I can't seem to get the word itself.
Here is what I have to get the count of the most common term, where entries
is the SortedSetMultimap
, along with helper methods:
public int mostFrequentWordFrequency()
{
return entries
.keySet()
.stream()
.map(this::totalFrequencyOfWord)
.max(Comparator.naturalOrder()).orElse(0);
}
public int totalFrequencyOfWord(String word)
{
return getEntriesOfWord(word)
.stream()
.mapToInt(IndexEntry::getFrequency)
.sum();
}
public SortedSet<IndexEntry> getEntriesOfWord(String word)
{
return entries.get(word);
}
I am trying to learn Java 8 features because they seem really useful. However, I can't seem to get the stream working the way I want. I want to be able to have both the word and it's frequency at the end of the stream, but barring that, if I have the word, I can very easily get the total occurrences of that word.
Currently, I keep ending up with a Stream<SortedSet<IndexEntry>>
, which I can't do anything with. I don't know how to get the most frequent word without the frequencies, but if I have the frequency, I can't seem to keep track of the corresponding word. I tried creating a WordFrequencyPair
POJO class to store both, but then I just had a Stream<SortedSet<WordFrequencyPair>>
, and I couldn't figure out how to map that into something useful.
What am I missing?