0

I am practicing Java on my own from a book. I read the chapter on text processing and wrapper classes and attempted the excercise below.

Word Counter

Write a program that asks the user for the name of a file. The program should display the number of words that the file contains.

import java.io.File;
import java.io.IOException;
import java.util.Scanner;
import java.util.StringTokenizer;

public class FileWordCounter {

    public static void main(String[] args) throws IOException {

        // Create a Scanner object 
        Scanner keyboard = new Scanner(System.in);

        // Ask user for filename
        System.out.print("Enter the name of a file: ");
        String filename = keyboard.nextLine();

        // Open file for reading
        File file = new File(filename);
        Scanner inputFile = new Scanner(file);

        int words = 0;
        String word = "";

        while (inputFile.hasNextLine()) {
            String line = inputFile.nextLine();
            System.out.println(line); // for debugging
            StringTokenizer stringTokenizer = new StringTokenizer(line, " \n.!?;,()"); // Create a StringTokenizer object and use the current line contents and delimiters as parameters
            while (stringTokenizer.hasMoreTokens()) { // for each line do this
                word = stringTokenizer.nextToken();
                System.out.println(word); // for debugging
                words++;
            }
            System.out.println("Line contains " + words + " words");
        }

        // Close file
        inputFile.close();

        System.out.println("The file has " + words + " words.");
    }

}

I chose this random poem from online to test this program. I put the poem in a file called TheSniper.txt:

Two hundred yards away he saw his head;
He raised his rifle, took quick aim and shot him.
Two hundred yards away the man dropped dead;
With bright exulting eye he turned and said,
'By Jove, I got him!'
And he was jubilant; had he not won
The meed of praise his comrades haste to pay?
He smiled; he could not see what he had done;
The dead man lay two hundred yards away.
He could not see the dead, reproachful eyes,
The youthful face which Death had not defiled
But had transfigured when he claimed his prize.
Had he seen this perhaps he had not smiled.
He could not see the woman as she wept
To the news two hundred miles away,
Or through his very dream she would have crept.
And into all his thoughts by night and day.
Two hundred yards away, and, bending o'er
A body in a trench, rough men proclaim
Sadly, that Fritz, the merry is no more.
(Or shall we call him Jack? It's all the same.)

Here is some of my output... For debugging purposes, I print out each line and the total words in the file up including those in the current line.

Enter the name of a file: TheSniper.txt
Two hundred yards away he saw his head;
Two
hundred
yards
away
he
saw
his
head
Line contains 8 words
He raised his rifle, took quick aim and shot him.
He
raised
his
rifle
took
quick
aim
and
shot
him
Line contains 18 words
...

At the end, my program displays that the poem has 176 words. However, Microsoft Word counts 174 words. I see from printing each word that I am miscounting apostrophes and single quotes. Here is the last section of the poem in my output where the problem occurs:

(Or shall we call him Jack? It's all the same.)
Or
shall
we
call
him
Jack
It
s
all
the
same
Line contains 176 words
The file has 176 words

In my StringTokenizer parameter list, when I don't delimit a single quote, which looks like an apostrophe, the word "It's" is counted as one. However, when I do, its counted as two words (It and s) because the apostrophe, which looks like a single quote, gets delimited. Also, the phrase 'By Jove, I got him!' is miscounted when I don't delimit the single quote/apostrophe. Are the apostrophe and single quote the same character when it comes to delimiting them?? I'm not sure how to delimit single quotes that surround a phrase but not an apostrophe between a word like "It's". I hope I am somewhat clear in asking my question. Please ask for any clarifications. Any guidance is appreciated. Thank you!

  • 1
    Is there any reason why you can't just use whitespace (space, tab, newline) as your delimiters? In the phrase `'By Jove, I got him!'` it doesn't matter if the first word is `'By` and the last is `him!'` for purposes of _counting_ words, even though it doesn't look as nice when printing out what words were found (which is only for debugging, per your comment). (also see http://stackoverflow.com/questions/8813779/) – Stephen P Jan 10 '17 at 01:38
  • Thank you! That makes sense. –  Jan 10 '17 at 03:28

1 Answers1

0

Why not use another Scanner for each line to count the number of words?

    int words = 0;
    while (inputFile.hasNextLine()) {
        int lineLength = 0;
        Scanner lineScanner = new Scanner(inputFile.nextLine());
        while (lineScanner.hasNext()) {
            System.out.println(lineScanner.next());
            lineLength++;
        }
        System.out.println("Line contains " + lineLength + " words");
        words += lineLength;
    }

I don't believe it is possible to delimit a single quote for a phrase like "'By Jove, I got him!'", but ignore it in "it's" unless you use a regex search to ignore single quotes in the middle of a word.

Alternatively, you could treat the characters ".!?;,()" as part of a single word (eg. "Jack?" is one word), which will give you the correct word count. This is what the scanner does. Just change the delimiter in your StringTokenizer to " " (\n isn't required since you're already searching each line):

StringTokenizer stringTokenizer = new StringTokenizer(line, " ");
daladier
  • 38
  • 6