-1

I want to read a .txt, .doc and .docx files and print the contents of those files.when i run the below code some .doc and .txt files are read but many files are not able to read.

import java.io.File;
import javax.swing.*;
import java.io.BufferedReader;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileReader;

public class FindYourDocx 
{

    public static void main(String[] args) 
    {
String text = "";
    int read, N = 1024 * 1024;
    char[] buffer = new char[N];

    try { 
        JFileChooser openFile=new JFileChooser();
             openFile.setCurrentDirectory(new File("."));
             openFile.showOpenDialog(null);
            File f1=openFile.getSelectedFile();
           String file1=f1.toString();
           File f =new File(file1);
           JOptionPane.showMessageDialog(null,f);
        FileReader fr = new FileReader(f);
        BufferedReader br = new BufferedReader(fr);

        while(true) {
            read = br.read(buffer, 0, N);
            text += new String(buffer, 0, read);
        System.out.println("Follows"+text+" "); 
                    if(read < N) {
                break;
            }
       System.out.println("Follows"+text+" "); }
    } catch(Exception ex) {
        ex.printStackTrace();
    }

    }}

by executing the above code (for some files) i got some wired messages as follows
https://i.stack.imgur.com/RwNWM.jpg

Someone please help me to solve this issues....

to read .docx i came across something like XWPFDocument using apacheio ....what is this ?

Johnny000
  • 2,058
  • 5
  • 30
  • 59

2 Answers2

0

First of all you should think about your problem: What do different file types look like as a file, what is their structure, what's the content which you would like to print and what does "printing" mean at all? What your are doing is reading files, treating them as text and printing them to STDOUT. Does "printing" mean this in your case? I interpret "printing" as being able to send content to a printer and get some paper.

Another hint: Doc and Docx are binary files, which contain "printable" text "somewhere". You can't just read the files and do something with the data. You need to know how those file formats look like, were the content is etc. Java can't do that out of the box, you need additional libraries to parse those file formats and do something with them.

There are many tutorials and questions around formats like docx:

How to read docx file content in java api using poi jar

Community
  • 1
  • 1
Thorsten Schöning
  • 3,501
  • 2
  • 25
  • 46
  • Sorry friend i can't get what u said....can u please upload a code to read .doc and .docx files??? and am a beginner in java so please help me to solve this problems ;( – user2576388 Oct 26 '13 at 14:23
  • I have no source code, I just wanted to get you to think about your problem: Is it enough to just read some bytes from any file to print the contents in a human friendly way? No, it's not, you need to think about different file formats and find some libs which are able to parse them. – Thorsten Schöning Oct 26 '13 at 14:37
0

to read .docx i came across something like XWPFDocument using apacheio ....what is this ?

You mean Apache POI. To find out more, check the website. In brief, both Apache POI and docx4j (which I note you have tagged) are Java libraries aimed at reading, manipulating, and writing Microsoft Office files.

'doc' files are Microsoft proprietary binary files. If you try to read them in and display them using the Java IO API alone, all you will see is a representation of the binary data. It won't be useful to you. You need to use an API specifically for loading up and traversing Word files, which is where Apache POI or docx4j come in.

'docx' files are a newer XML-based Microsoft Office format. A docx file is essentially a zipped folder containing the various assets that make up a Word file.

As I said, in order to read a Word file properly, you will need to use one of the libraries mentioned. Both the Apache and docx4j websites contain plenty of example code to get you started opening and traversing Word documents (note that POI can work with the older .doc format, whereas docx4j is only for .docx files).

http://www.docx4java.org

http://poi.apache.org

Ben
  • 7,548
  • 31
  • 45