0

I have a .docx file that contains a single table. I want to remove all text from rows 2 to the end. However the method myTable.getRow(somecounter).getCell(somecounter2).setText("") doesn't work as it only concatenates " " to the existing value. I also tried making a XWPFRun and doing run.setText("") created from myTable.getRow(sc).getCell(sc2).getParagraphs().get(0).createRun() but it doesn't work aswell.

Also tried the solution from this thread, no luck this time :(

Any ideas how to easily remove text from the cell? My idea is to make a new table from scratch and fill it with content but it seems really arduous.

Community
  • 1
  • 1
Karatte
  • 25
  • 3
  • 9

1 Answers1

1

Your requirement "to remove all text from rows 2 to the end" will be a little bit complicated to fulfil since a Word table cell can contain much other things than only text.

Consider the following table:

enter image description here

So if the requirement is to remove all content from rows 2 to the end, then you could simply replace all cells with new clean ones. Or at least with ones which have only an empty paragraph in it.

import java.io.FileOutputStream;
import java.io.FileInputStream;

import java.util.List;

import org.apache.poi.xwpf.usermodel.*;

import org.openxmlformats.schemas.wordprocessingml.x2006.main.CTTc;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.CTTc;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.CTP;

/*
needs the full ooxml-schemas-1.3.jar as mentioned in https://poi.apache.org/faq.html#faq-N10025
since the CTRowImpl is not fully shipped with poi-ooxml-schemas-3.13-*.jar
*/

public class WordCleanTableRows {

 public static void main(String[] args) throws Exception {

  FileInputStream fis = new FileInputStream("document.docx");
  XWPFDocument doc = new XWPFDocument(fis);

  List<XWPFTable> tables = doc.getTables();
  XWPFTable table = tables.get(0);

  XWPFTableRow[] rows = table.getRows().toArray(new XWPFTableRow[0]);
  for (int r = 0; r < rows.length; r++) {
   if (r > 0) {
    XWPFTableRow row = rows[r];
    CTTc[] cells = row.getCtRow().getTcList().toArray(new CTTc[0]);
    for (int c = 0; c < cells.length; c++) {
     CTTc cTTc = cells[c];
     //clear only the paragraphs in the cell, keep cell styles
     cTTc.setPArray(new CTP[] {CTP.Factory.newInstance()});
     cells[c] = cTTc;
    }
    row.getCtRow().setTcArray(cells);
    //System.out.println(row.getCtRow());
   }
  }

  doc.write(new FileOutputStream("new document.docx"));

 }
}

This needs the full ooxml-schemas-1.3.jar as mentioned in https://poi.apache.org/faq.html#faq-N10025 since the CTRowImpl is not fully shipped with poi-ooxml-schemas-3.13-*.jar.

Without the full ooxml-schemas-1.3.jar you could simply remove all rows except the first one and add new ones.

import java.io.FileOutputStream;
import java.io.FileInputStream;

import java.util.List;

import org.apache.poi.xwpf.usermodel.*;

public class WordCleanTableRows2 {

 public static void main(String[] args) throws Exception {

  FileInputStream fis = new FileInputStream("document.docx");
  XWPFDocument doc = new XWPFDocument(fis);

  List<XWPFTable> tables = doc.getTables();
  XWPFTable table = tables.get(0);

  XWPFTableRow[] rows = table.getRows().toArray(new XWPFTableRow[0]);
  for (int r = 0; r < rows.length; r++) {
   if (r > 0) {
    XWPFTableRow row = rows[r];
    table.removeRow(1); //remove second row. others shift upwards
    table.createRow(); //add new row at the end
   }
  }

  doc.write(new FileOutputStream("new document.docx"));

 }
}

Edit:

The following should work without ooxml-schemas-1.3.jar and do the same as my first example.

import java.io.FileOutputStream;
import java.io.FileInputStream;

import java.util.List;

import org.apache.poi.xwpf.usermodel.*;

import org.openxmlformats.schemas.wordprocessingml.x2006.main.CTP;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.STOnOff;

import java.math.BigInteger;

public class WordCleanTableRows3 {

 public static void main(String[] args) throws Exception {

  FileInputStream fis = new FileInputStream("document.docx");
  XWPFDocument doc = new XWPFDocument(fis);

  List<XWPFTable> tables = doc.getTables();
  XWPFTable table = tables.get(0);

  XWPFTableRow[] rows = table.getRows().toArray(new XWPFTableRow[0]);
  for (int r = 0; r < rows.length; r++) {
   if (r > 0) {
    XWPFTableRow row = rows[r];
    List<XWPFTableCell> cells = row.getTableCells();
    for (XWPFTableCell cell : cells) {
     //get CTTc and replace the CTPArray with one empty CTP
     cell.getCTTc().setPArray(new CTP[] {CTP.Factory.newInstance()});

     //set some default styles for the paragraphs in the cells:
     //http://grepcode.com/file/repo1.maven.org/maven2/org.apache.poi/ooxml-schemas/1.1/org/openxmlformats/schemas/wordprocessingml/x2006/main/CTParaRPr.java  
     CTP cTP = cell.getCTTc().getPArray(0);
     cTP.addNewPPr();
     cTP.getPPr().addNewRPr();
     cTP.getPPr().getRPr().addNewB().setVal(STOnOff.ON);
     cTP.getPPr().getRPr().addNewColor().setVal("FF0000");
     cTP.getPPr().getRPr().addNewSz().setVal(BigInteger.valueOf(40));
    }
   }
  }

  doc.write(new FileOutputStream("new document.docx"));

 }
}

The org.openxmlformats.schemas.wordprocessingml.x2006.main.CTP is shipped with poi-ooxml-schemas-3.13-*.jar.

Axel Richter
  • 56,077
  • 6
  • 60
  • 87
  • The requirement is to remove all text, however cell contains only plain text without additional content such as images or even whitespaces so it doesn't matter whether I remove text or content. Your solution seems fine although it gives me `java.lang.NoClassDefFoundError: org/openxmlformats/schemas/wordprocessingml/x2006/main/impl/CTRowImpl$1TcList` at line `CTTc[] cells = row.getCtRow().getTcList().toArray(new CTTc[0]);` It's weird because I have the newest POI release (3.13) and I have added all files from binary release. – Karatte Feb 06 '16 at 12:18
  • Importing some obsolete .jar worked with the exception, however the loop along with content removed all table borders. I tried to fix this with `table.setInsideHBorder(XWPFBorderType.SINGLE, 4, 0, "000000")` but this obviously works only with inside borders so the external ones are still invisible. Also page orientation is vertical while it should be horizontal :( – Karatte Feb 06 '16 at 12:35
  • See my supplements. The approaches removes complete cells. So of course all cell styles will be removed too. I've provided now a approach which clears only the paragraphs in the cells and keeps the cell styles with my first example. But how removing and inserting table cells shall impact the page orientation I can't reproduce. – Axel Richter Feb 06 '16 at 13:28
  • The first approach works like charm so I don't need to change it. Apart from non-existent external borders and wrong page orientation the rest is ok. Is it possible to change text formatting (font, bold, etc.) inside cell using your approach? Besides thank you very much! – Karatte Feb 06 '16 at 15:56