I am practicing Java on my own from a book. I read the chapter on text processing and wrapper classes and attempted the excercise below.
Word Counter
Write a program that asks the user for the name of a file. The program should display the number of words that the file contains.
import java.io.File;
import java.io.IOException;
import java.util.Scanner;
import java.util.StringTokenizer;
public class FileWordCounter {
public static void main(String[] args) throws IOException {
// Create a Scanner object
Scanner keyboard = new Scanner(System.in);
// Ask user for filename
System.out.print("Enter the name of a file: ");
String filename = keyboard.nextLine();
// Open file for reading
File file = new File(filename);
Scanner inputFile = new Scanner(file);
int words = 0;
String word = "";
while (inputFile.hasNextLine()) {
String line = inputFile.nextLine();
System.out.println(line); // for debugging
StringTokenizer stringTokenizer = new StringTokenizer(line, " \n.!?;,()"); // Create a StringTokenizer object and use the current line contents and delimiters as parameters
while (stringTokenizer.hasMoreTokens()) { // for each line do this
word = stringTokenizer.nextToken();
System.out.println(word); // for debugging
words++;
}
System.out.println("Line contains " + words + " words");
}
// Close file
inputFile.close();
System.out.println("The file has " + words + " words.");
}
}
I chose this random poem from online to test this program. I put the poem in a file called TheSniper.txt:
Two hundred yards away he saw his head;
He raised his rifle, took quick aim and shot him.
Two hundred yards away the man dropped dead;
With bright exulting eye he turned and said,
'By Jove, I got him!'
And he was jubilant; had he not won
The meed of praise his comrades haste to pay?
He smiled; he could not see what he had done;
The dead man lay two hundred yards away.
He could not see the dead, reproachful eyes,
The youthful face which Death had not defiled
But had transfigured when he claimed his prize.
Had he seen this perhaps he had not smiled.
He could not see the woman as she wept
To the news two hundred miles away,
Or through his very dream she would have crept.
And into all his thoughts by night and day.
Two hundred yards away, and, bending o'er
A body in a trench, rough men proclaim
Sadly, that Fritz, the merry is no more.
(Or shall we call him Jack? It's all the same.)
Here is some of my output... For debugging purposes, I print out each line and the total words in the file up including those in the current line.
Enter the name of a file: TheSniper.txt
Two hundred yards away he saw his head;
Two
hundred
yards
away
he
saw
his
head
Line contains 8 words
He raised his rifle, took quick aim and shot him.
He
raised
his
rifle
took
quick
aim
and
shot
him
Line contains 18 words
...
At the end, my program displays that the poem has 176 words. However, Microsoft Word counts 174 words. I see from printing each word that I am miscounting apostrophes and single quotes. Here is the last section of the poem in my output where the problem occurs:
(Or shall we call him Jack? It's all the same.)
Or
shall
we
call
him
Jack
It
s
all
the
same
Line contains 176 words
The file has 176 words
In my StringTokenizer parameter list, when I don't delimit a single quote, which looks like an apostrophe, the word "It's" is counted as one. However, when I do, its counted as two words (It and s) because the apostrophe, which looks like a single quote, gets delimited. Also, the phrase 'By Jove, I got him!' is miscounted when I don't delimit the single quote/apostrophe. Are the apostrophe and single quote the same character when it comes to delimiting them?? I'm not sure how to delimit single quotes that surround a phrase but not an apostrophe between a word like "It's". I hope I am somewhat clear in asking my question. Please ask for any clarifications. Any guidance is appreciated. Thank you!