0

I'm new to reading text from a file. I've got a task for which I need to print the amount of words which are in a file.

I'm using TextEdit on mac OS which ends in .rtf

When I run the following program, I get the output 5 even when the document is empty. When I add words, the count doesn't increment correctly.

Thanks.

import java.io.File;
import java.io.FileNotFoundException;
import java.util.Scanner;

public class Analyze{ 


public static void main(String[] args) throws FileNotFoundException{
    Scanner console = new Scanner(System.in);
    int words = 0; 
    System.out.println("This is a word counter");
    System.out.println("File name");
    String filename = console.next();
    File name = new File(filename);

    Scanner int2 = new Scanner(name);

    while (int2.hasNext()) {
        String temp = int2.next();
        words++;
    }

    System.out.println(words);
    }
}
Anthony J
  • 551
  • 1
  • 9
  • 19

3 Answers3

3

The problem is that you are reading a RTF file.

A 'blank' (as in no entered text) RTF file generated with TextEdit looks like this:

{\rtf1\ansi\ansicpg1252\cocoartf1404\cocoasubrtf130
{\fonttbl}
{\colortbl;\red255\green255\blue255;}
\margl1440\margr1440\vieww10800\viewh8400\viewkind0
}

As you can see, the five lines correspond to the output of 5.

Either parse RTF in your program, which I doubt you want to do, or switch TextEdit to plaintext mode. See here

Community
  • 1
  • 1
andars
  • 1,384
  • 7
  • 12
0

The file you're trying to count is an RTF file? Does it support italics, bold, font selection and things like that? In that case, it probably contains some data, even if there is no text. Your program does not care about the file format, so it naïvely reads everything as text.

Try running od or hexdump on your file (not sure if these exist on Mac OS X?) -- they print the exact bytes of a file. A truly empty file should not yield any output.

If your computer doesn't have the od or hexdump programs, you could try cat. It doesn't print the contents as numbers, so it doesn't give a 100% accurate view of special characters, but it should be able to demonstrate to you whether your file is empty or not.

Snild Dolkow
  • 6,669
  • 3
  • 20
  • 32
0

Besides the RTF-Problem, also note that

A Scanner breaks its input into tokens using a delimiter pattern, which by default matches whitespace.

with whitespace as in

A whitespace character: [ \t\n\x0B\f\r]

so the count is including tabs, newlines, etc. not only blanks

tom
  • 1,455
  • 1
  • 8
  • 13