I'm trying to do an exercise where I need to create a class to read the words from a .txt put the words in an HashSet. The thing is, if the text read "I am Daniel, Daniel I am." I'll have a word for "am" , "am." and "Daniel," and "Daniel". How do I fix this?
Here's my code. (I tried to use regex, but I'm getting an exception):
import java.io.File;
import java.io.FileNotFoundException;
import java.util.HashSet;
import java.util.Scanner;
public class WordCount {
public static void main(String[] args) {
try {
File file = new File(args[0]);
HashSet<String> set = readFromFile(file);
set.forEach(word -> System.out.println(word));
}
catch(FileNotFoundException e) {
System.err.println("File Not Found!");
}
}
private static HashSet<String> readFromFile(File file) throws FileNotFoundException {
HashSet<String> set = new HashSet<String>();
Scanner scanner = new Scanner(file);
while(scanner.hasNext()) {
String s = scanner.next("[a-zA-Z]");
set.add(s.toUpperCase());
}
scanner.close();
return set;
}
}