0

I'm trying to find certain text in a pdf and making the font color white. As a POC I've already succeeded finding text and highlighting it in the pdf based on the code written by mkl here: find position of text in pdf

Is it however possible, based on the received coordinates to change the font color of the text inside the rectangle instead of highlighting the text? Alternatively, can I add a white rectangle to cover the text?

Thanks in advance

edit: I have started adding the rectangles to the pdf, however as stated they are not in correct position. This is what I have so far (don't mind the style, just a POC):

TextPositionSequence class by mkl

byte[] content = ...;
PDDocument document = PDDocument.load(content);
for (int page = 1; page <= document.getNumberOfPages(); page++) {
            List<TextPositionSequence> hits = null;
            try {
                hits = findSubwordsImproved(document, page, "[" + searchTerm + "]");
            } catch (IOException e) {
                e.printStackTrace();
            }
            for (TextPositionSequence hit : hits) {
                TextPosition lastPosition = hit.textPositionAt(hit.length() - 1);
                TextPosition firstPosition = hit.textPositionAt(0);

                PDPage actualPage = document.getPage(page - 1);

                PDRectangle cropBox = actualPage.getCropBox();

                float x = firstPosition.getTextMatrix().getTranslateX() + cropBox.getLowerLeftX();
                float y = firstPosition.getTextMatrix().getTranslateY() + cropBox.getLowerLeftY();
                float w = hit.getWidth();
                try {
                    PDPageContentStream contents = new PDPageContentStream(document, actualPage, PDPageContentStream.AppendMode.APPEND, false);
                    contents.setNonStrokingColor(Color.RED);
                    contents.addRect(x, y, w, firstPosition.getHeight());
                    contents.fill();
                    contents.close();
                } catch (IOException e) {
                    e.printStackTrace();
                }
           }
}
Annanraen
  • 21
  • 6
  • it is possible to change the color of text but it's not trivial. Adding white rectangles covering the text is much easier. In both cases, though, one can still successfully search the hidden text and copy&paste it. – mkl Nov 05 '20 at 15:36
  • It is okay that one can still copy/paste the hidden text. Any chance you can point me in the right direction for white rectangle? At the moment I'm getting a blank page with a rectangle at the last text location, lol. I'll tinker a bit further myself too – Annanraen Nov 05 '20 at 15:57
  • I can already add the rectablges over the text. However, the rectangles are not in the correct position. I've already looked at [link]https://stackoverflow.com/questions/46080131/text-coordinates-when-stripping-from-pdfbox[link], but using the cropbox doesn't help – Annanraen Nov 05 '20 at 16:35
  • *"I can already add the rectablges over the text. However, the rectangles are not in the correct position."* - ok, that's a start. Please share you pivotal code. Maybe we can fix the coordinates easily. – mkl Nov 05 '20 at 16:57
  • I've edited the question with the code I currently have – Annanraen Nov 06 '20 at 06:45
  • 1
    Apparently the trick was to add resetcontext true and don't use the textmatrix.getTranslate methods. Thanks for your time! – Annanraen Nov 06 '20 at 07:57
  • If you don't use the `getTranslate` methods, you are likely to run into trouble as soon as you process PDFs with page rotation. – mkl Nov 06 '20 at 10:52
  • oh good to know, I'll keep that in mind – Annanraen Nov 06 '20 at 13:25

1 Answers1

0

I've solved it with the following code. I fiddled with the rectangle height a bit to get the box to cover the entire text. This might need tweaking in the future:

float posXInit = hit.getX();
float posXEnd = lastPosition.getXDirAdj() + lastPosition.getWidth();
float posYInit = firstPosition.getPageHeight() - firstPosition.getYDirAdj();
float posYEnd = firstPosition.getPageHeight() - lastPosition.getYDirAdj();
float height = firstPosition.getHeight();

PDPageContentStream contents = new PDPageContentStream(document, actualPage, PDPageContentStream.AppendMode.APPEND, false, true);
contents.setNonStrokingColor(Color.WHITE);
contents.addRect(posXInit, posYEnd - height / 3, hit.getWidth(), height * 2);
contents.fill();
contents.close();
Annanraen
  • 21
  • 6