0

While find out no of line in a file, it displays more number of lines.

for example in my file having only 26 line in word document. But while count using java program it displays 118.

    File f=new File("C:\\Users\\os05\\Desktop\\Venkatesan(13-02-10).doc");

    FileReader fr = new FileReader(f);

    LineNumberReader ln = new LineNumberReader(fr);

    int count = 0;

    while (ln.readLine() != null)
    {

      count++;
    }
    System.out.println("No of lines:"+count);

The above code, how is calculate the no. of line....?

Jon Seigel
  • 12,251
  • 8
  • 58
  • 92
Venkat
  • 20,802
  • 26
  • 75
  • 84
  • Is that word document is Microsoft Word Document? – YOU Feb 25 '10 at 11:37
  • Yes, it is Microsoft Word Document.. – Venkat Feb 25 '10 at 11:38
  • 16
    You should accept some answers – mmmmmm Feb 25 '10 at 11:38
  • 1
    Try to open your Word document in a notepad. Your Java code had to read through it. Poor little. :) Try reading the javadoc of the classes you want to do. Then you will see FileReader is not a UniversalDocumentReader, just a simple files-bytes-to-string-converter (using some encoding you set before) and LineNumberReader a reader that recognizes end-of-line-characters. – helios Feb 25 '10 at 11:44
  • @Mark 0/5 is not such a big deal, especially in 5 days. – Pascal Thivent Feb 25 '10 at 12:03
  • @Pascal: it's not a big deal, agreed. But the suggestion is still in order. Note how he said "should". – Joachim Sauer Feb 25 '10 at 14:02
  • @Joachim There is nothing wrong with having a question staying open during *5* days (for the oldest) IMO, you may not want to discourage additional answers. That should be taken into account, which is my point, this comment (and **especially** the votes) are premature. – Pascal Thivent Feb 25 '10 at 14:14
  • @Joachim Oh indeed, I didn't check his profile, only the balloon tooltip (*"this user has accepted an answer for 0 of 5 eligible questions"*). On the basis of the profile details, I guess you're right, the second option is more likely. But on the basis of the tooltip (which is what I use, I only checked his profile to see for how long he was here and didn't pay attention to the rest), I still think it was premature pressure. Anyway, I don't want to play the police of the police actually :) – Pascal Thivent Feb 25 '10 at 14:51
  • @Pascal: i think the "0 of 5" means that SO thinks that 5 of his questions are old enough and have enough answers that he should have accepted an answer. It doesn't count very new questions and questions with too few (or no) answers. – Joachim Sauer Feb 25 '10 at 15:46

4 Answers4

10

You're trying to treat a Word document as if it were a plain text file (*).

A Word document however is a binary file with a proprietary format that you need to interpret correctly to extract the information contained in it.

There are libraries out there that handle such files, for example Apache POI.

If you just want to do this for experimentation and learning, then it might be easier to just stick with simple text files (as produced by Notepad, for example).

(*) even if there is no such thing as plain text.

Joachim Sauer
  • 302,674
  • 57
  • 556
  • 614
3

If it is Microsoft Word Document, they are binary files, you couldn't do that way.

You need to find appropriate api for microsoft word files.

YOU
  • 120,166
  • 34
  • 186
  • 219
2

Your problem is that you are looking at a doc file - which is not held in plain text. In order to find the number of lines in a microsoft word file, you are going to have to use a dedicated library...

The file format is available at www.wotsit.org, but I doubt that this alone will help you...

Martin Milan
  • 6,346
  • 2
  • 32
  • 44
1

You may also use the open office api to access to contents of Office documents. FAQ on OpenOffice.org API

elou
  • 1,242
  • 15
  • 21