Not able to replace a text in PDF using PDFBox 2.0.2

Question

My requirements

1) I need to identify a particular text pattern
2) Then replace that text pattern with pre-defined text-value with the same format of text pattern, such as font, font colour, bold …

3) I am able to identify the text, replace that text with predefined values, But writing to PDF is failing.

I tried the following 2 appraches to write to PDF

1) By Overriding writeString(String string, List textPositions)of PDFTextStripper

2) By using cosArray.add(new COSString(replacedField)); or cosArray.set(…)

Results for approach 1 - By Overriding writeString

The pdf generated by this code is not getting opened in PDF. I am able to open in word, But there is no format of original text.

Results for approach 2 - By using cosArray.add or cosArray.set(…) I am seeing only boxes in generated PDF .

Code for approach 1 - By Overriding writeString

public void rewrite(String templatePDFPath) throws IOException {

    PDDocument document = null;

    Writer pdfWriter = null;

    try {

        File templateFile = new File(templatePDFPath);
        document = PDDocument.load(templateFile);

        this.setSortByPosition(true);
        this.setStartPage(0);
        this.setEndPage(document.getNumberOfPages());

        pdfWriter = new PrintWriter(Utils.getFilePathWithTimeStamp(templatePDFPath).toString());

        this.writeText(document, pdfWriter);

    } finally {
        if (document != null) {
            document.close();
        }

        if (null != pdfWriter)
            pdfWriter.close();

        // if (null != pdfWriter)
        // pdfWriter.close();

    }
}

protected void writeString(String string, List<TextPosition> textPositions) throws IOException {

    for (int i = 0; i < textPositions.size(); i++) {
        TextPosition text = textPositions.get(i);

        String currentCharcter = text.getUnicode();
        // System.out.println("String[" + text.getXDirAdj() + "," + //
        // text.getYDirAdj() + " fs=" + text.getFontSize() // + " xscale=" +
        // text.getXScale() + " height=" + // text.getHeightDir() + "
        // space=" // +
        // text.getWidthOfSpace() + " width=" + text.getWidthDirAdj() + //
        // "]" +
        // currentCharcter);

    }
    String replacedString = replaceFields(string.trim());

    if (!(string.equals(replacedString))) {
        System.out.println("Field " + string + " is replaced by value " + replacedString);
        // super.writeString(replacedString, textPositions);
        super.writeString(replacedString);
    }

}

Code for approach 2 - By using cosArray.add or cosArray.set(…)

public List<String> replaceFieldsInCosArray(COSArray cosArray) {
    List<String> replacedStrings = new ArrayList<String>();
    String stringsOfCOSArray = "";

    for (int cosArrayIndex = 0; cosArrayIndex < cosArray.size(); cosArrayIndex++) {
        Object cosObject = cosArray.get(cosArrayIndex);

        if (cosObject instanceof COSString) {
            COSString cosString = (COSString) cosObject;
            stringsOfCOSArray += cosString.getString();
        }
    }
    stringsOfCOSArray = stringsOfCOSArray.trim();



    //cosArray.clear();



        String replacedField = this.replaceFields(stringsOfCOSArray);
        System.out.println("cosText:" + stringsOfCOSArray + ":replacedField:" + replacedField);

        cosArray.add(new COSString(replacedField));

        if (!stringsOfCOSArray.equals(replacedField)) {
            replacedStrings.add(replacedField);
        }

strong text

duplicate of https://stackoverflow.com/questions/35420609/pdfbox-2-0-rc3-find-and-replace-text — Tilman Hausherr, Jul 11 '16 at 13:52

score 0 · Answer 1 · answered Jul 12 '16 at 20:49

1) By Overriding writeString(String string, List textPositions)of PDFTextStripper

PDFTextStripper is a tool for extraction of plain text. Thus, it is not surprising that your output cannot be opened as pdf. Furthermore, word can open it because word recognises it as plain text and opens it as such.

2) By using cosArray.add(new COSString(replacedField)); or cosArray.set(…)

It is not really clear what you mean here. In particular, which cosArray are you talking about?

One might assume you mean the parameter of the TJ operator but there are multiple reasons against that assumption:

the TJ operator is but one of the many text showing operators and the only one accepting am array argument; thus, you would look at only a few of the operators in question;
your code would assume that the whole text pattern you try to identify is drawn by the same operation; why should it?
you seem to assume that cosString.getString() returns something intelligible; unfortunately that is not the case in general, merely if the fonts in question usesome standard encoding which had been becoming less and less common;
furthermore, you assume that the glyphs for your replacement text are contained in the font of the replaced text. Why should they? Embedded font subsets have become more and more common...

Thus, what do you actually mean here?

That all being said, if you happen to work merely with naively built pdfs, you might want to look at the answer to the question @Tilmann pointed you to. There is a small set of pdfs that code may work for.

If your pdfs happen to be more sophisticated, though, even describing the approach would be beyond the scope of a single stackoverflow answer.

By the way, your requirements are not well defined, in particular

replace that text pattern with pre-defined text-value with the same format of text pattern, such as font, font colour, bold …

If the predefined text has three letters, the replacement has two letters, and the found occurrence has the first glyph in red, the second in green, and the third in blue, how should the two replacement glyphs be drawn using those three colors?

Not able to replace a text in PDF using PDFBox 2.0.2

1 Answers1