0

I'm trying to do an exercise where I need to create a class to read the words from a .txt put the words in an HashSet. The thing is, if the text read "I am Daniel, Daniel I am." I'll have a word for "am" , "am." and "Daniel," and "Daniel". How do I fix this?

Here's my code. (I tried to use regex, but I'm getting an exception):

import java.io.File;
import java.io.FileNotFoundException;
import java.util.HashSet;
import java.util.Scanner;

public class WordCount {

    public static void main(String[] args) {
        try {
            File file = new File(args[0]);
            HashSet<String> set = readFromFile(file);
            set.forEach(word -> System.out.println(word));
        }
        catch(FileNotFoundException e) {
            System.err.println("File Not Found!");
        }

    }

    private static HashSet<String> readFromFile(File file) throws FileNotFoundException {
        HashSet<String> set = new HashSet<String>();
        Scanner scanner = new Scanner(file);
        while(scanner.hasNext()) {
            String s = scanner.next("[a-zA-Z]");
            set.add(s.toUpperCase());
        }
        scanner.close();
        return set;
    }


}
Daniel Oscar
  • 287
  • 1
  • 9

1 Answers1

0

Error is thrown when the Scanner try to read a string not matching with the regex.

String s = scanner.next("[a-zA-Z]");

Instead of passing the regex in the Scanner. Read the word and remove the special characters as shown below.

String s = scanner.next();
s = s.replaceAll("[^a-zA-Z]", "");
Sreejith
  • 506
  • 5
  • 7