11

So... basically I have a docx file. And I have to do some formatting changes in few paragraphs and then save in a new file. What I am doing is essentially following.

import scala.collection.JavaConversions._
import org.apache.poi.xwpf.usermodel._

def format( sourceDocumentPath: String, outputDocumentPath: String ) {

  val sourceXWPFDocument = new XWPFDocument( new FileInputStream( sourcePath ) )

  // lets say I have a list of paragraph numbers... I want to format
  val parasToFormat = List( 2, 10, 15, 20 )

  val allParagraphs = sourceXWPFDocument.getParagraphs

  for ( ( paragraph, index ) <- allParagraphs.zipWithIndex ) {
    if( parasToFormat.contains( index ) ) {
      formatParagraph( paragraph )
    }
  }

  val outputDocx = new FileOutputStream( new File( outputDocumentPath ) );
  xwpfDocument.write( outputDocx )
  outputDocx.close()

}

def formatParagraph( paragraph: XWPFParagraph ): Unit = {
  // Do some color changing to few runs
  // Add few runs with new text.
}

For most part everything is working fine. The output docx is opening allright in LibreOffice on my Ubuntu.

But, when I transfer this output docx to a Windows system, and try to open this output docx in MS Word, I am getting infinite ( ever growing ) garbage pages.

Any guesses from the wise-one's of Poi community are welcome.

Also... One of my guesses is - May be the line endings in the files are confusing MS Word. As Ubuntu uses ( LF - \n ) line endings whereas windows uses ( CRLF - \r\n ). If this is actually the issue... then how do I fix it ?

Though... My code is in Scala... I think the similar should apply to Java code as well... and Most Poi users will be in java community... So I am also adding Java tag.

sarveshseri
  • 13,738
  • 28
  • 47
  • Anyone having some guess... ?? – sarveshseri Apr 08 '15 at 06:28
  • have you tried changing the line endings to the windows version? It would either confirm or deny your suspicion that the line endings are the problem. That way people can either be pointed down the right path by a confirmation, or not waste time going down a wrong path, if the problem is something else. – Davis Broda Apr 09 '15 at 12:48
  • Well... though we know that `docx` files are practically zip files containing various `xml`s. Now... though I can change this in all the xml files. I am not really sure how to correctly create a `docx` file out of these modified `xml`s. Which means... we need to somehow force the line endings while we are writing to the fileoutputstream . – sarveshseri Apr 09 '15 at 12:55

1 Answers1

3

Well... so I tried various things and finally solved the issue.

Basically the problem was being caused by following very simple thing,

def copyRunFontSizeAttribute( sourceRun: XWPFRun, targetRun: XWPFRun ): Unit = {
  targetRun.setFontSize( sourceRun.getFontSize )
}

Somehow, setting the font size of an instance XWPFRun, lets say xWPFRunTarget to the return value of xWPFRunSource.getFontSize ( where xWPFRunSource is another instance of XWPFRun ) causes some very weird and unexpected results.

So... for the moment I removed all those bits where I was doing this copyRunFontSizeAttribute thing which solved the issue.

sarveshseri
  • 13,738
  • 28
  • 47