0

I have some document of .doc and .pdf file and my requirement is to read a particular page from the .doc or .pdf file which i will provide at the run time .This can be possible by reading page by page and at the end of each page if i do numbering .but some i am getting some document where numbering is not their so how can i do that?

is their any api or any other logic so that i can fixed this problem?

hello all i have .DOC file but i am not supposed to read entire file instead i am given a page number. therefore i got to read only that particular page from the doc file. I am using apache.poi api.

     file = new File("c://doc/assignment/afternoon_24.doc");  
     FileInputStream fis=new FileInputStream(file.getAbsolutePath());  

i need to read the page X of this file and write to a text file?

loknath
  • 1,362
  • 16
  • 25
  • Concerning pdf files: There are multiple PDF libraries, and many of them allow for text extraction from individual pages. Are there any additional requirements? Licenses? Budget? Libraries already in use? – mkl Mar 19 '14 at 10:40
  • @mkl in our project,reading pdf is secondary requirement how to doc – loknath Mar 19 '14 at 10:56

1 Answers1

1

I guess there is a missunderstanding: You can not read a DOC (or PDF) simply as an Inputstream and skip pages (unless you know and evaluate the fileformat). Both files have a format (encoding the formatting and meta info into some binary formats). Just try to open a PDF in notepad or another plain text editor. You will see it.

As mkl suggested: to access the contents of a DOC (or PDF) you need a library that can handle that fileformat. For Microsoft Office formats there is for example the open source library Apache POI, for PDF there is for example PDF box among others and a full thread about it. There are different libraries for each of the formats with different features and licensing models.

Community
  • 1
  • 1
SCI
  • 546
  • 3
  • 6