I am iterating through several text files and I am trying to find the top 20 words amongst all the text files. I have managed to set up some code to find the top 20 words in a single file. However, now I am struggling with several files.
I have a global linked-hashmap where I want to store every new word (as a key) I come across in a text file and I want to update its value (the number of times it occurs) as I come across more of the word. For example in the first file, I find 8000 instances of the word "the" and in the next file I come across 7000 instances of "the" in another file then I want the value of the key "the" to be updated to 15000.
Here is my code:
import java.util.*;
import java.util.stream.Collectors;
import java.io.IOException;
import java.nio.file.*;
import java.util.Map.Entry;
import java.util.function.Function;
import java.io.File;
import java.nio.charset.StandardCharsets;
public class FileReaderTwo
{
static LinkedHashMap<String, Long> top20Words = null;
public static void main(String args[])
{
File dir = new File("data/");
for (File file : dir.listFiles())
{
try
{
top20Words = Files.lines(Paths.get(file.toString()), StandardCharsets.ISO_8859_1)
.flatMap(line -> Arrays.stream(line.toLowerCase().split("[\\(,\\).\\s+]+")))
.collect(Collectors.groupingBy(Function.identity(), Collectors.counting())).entrySet().stream()
.sorted(Entry.comparingByValue(Comparator.reverseOrder()))
.sorted(Map.Entry.<String, Long>comparingByValue().reversed())
.collect(Collectors.toMap(Entry::getKey, Entry::getValue, (u, v) -> u, LinkedHashMap::new));
} catch (IOException e)
{
e.printStackTrace();
}
}
System.out.println(top20Words);
}
}
Note: I know that at the moment it prints out every word, I wanted to deal with this issue first and fix that later.