3

My object is to read a .docx file and to display the text of that on the view(Webpage).

I am using apache POI to read a .docx file in Grails Application Please suggest me a way to display the output on view without loosing Blankspaces and LineBreaks.

My .docx document content

This is a .docx document ...
this is second line
this is third line

Result on Groovy console after reading when i am printing :

This is a .docx document ...
this is second line
this is third line

But when i pass the output to view It becomes

This is a .docx document ... this is second line this is third line

.

My code is : 

    import org.apache.poi.xwpf.usermodel.XWPFDocument
    import org.apache.poi.xwpf.extractor.XWPFWordExtractor

    ...
            String str = "E:\\Query.docx"
            File docFile = null;
            docFile = new File(str);
            FileInputStream fis=new FileInputStream(docFile.getAbsolutePath());
            XWPFDocument doc = new XWPFDocument(fis)
            XWPFWordExtractor docExtractor =  new XWPFWordExtractor(doc)
            println docExtractor.getText()
    ...

if one can suggest me the way to iterate through each line of the document then i can easily get my result. Please help me i have got stucked.

vishu
  • 195
  • 2
  • 11
  • Do you not just need to add some newline characters when printing? – Gagravarr Oct 08 '12 at 09:57
  • @Gagravarr: i need to display the text as it is on the webpage as readed from the file but while doing so line breakes are disappering. – vishu Oct 08 '12 at 10:00
  • Why not use something with does this already? I believe [docx4j](http://www.docx4java.org/trac/docx4j) does what you're trying to do (example [here](https://github.com/plutext/docx4j/blob/master/src/main/java/org/docx4j/samples/ConvertOutHtml.java)) Not tried it myself though – tim_yates Oct 08 '12 at 10:04
  • @tim_yates: I am using POI because i need to read .doc and .docx files and i have got success in getting desired result with .doc files but stucked with .docx files and i am afraid of jar conflict which took place yesterday when used POI in my grails application with other plugins jars. – vishu Oct 08 '12 at 10:12
  • HTML ignores line breaks, you'll need to replace them with div or P tags, or something like that – Gagravarr Oct 08 '12 at 10:18
  • @Gagravarr: Thanks for you comment which act as solution for me :) solved my problem by replacing '\n' with '
    '
    – vishu Oct 08 '12 at 10:38

1 Answers1

1

HTML ignores line breaks. So, while a string like "Hello there\nLine 2\n" renders fine in the console as

Hello There
Line 2

As HTML it'll all show on the same line. You'll need to replace the newline characters with some suitable HTML, eg <br /> or wrapping things in paragraph/div tags.

Gagravarr
  • 47,320
  • 10
  • 111
  • 156