0

This is a very straightforward issue. I added an invisible text layer using page.insert_text().

After saving the modified pdf, I can use page.get_text() to retrieve the created text layer.

I would like to be able to eliminate that layer, buy couldn't find a function to do it.

The solution I've came up with is taking the pages as images and create a new pdf. But it seems like a very inefficient solution.

I would like to be able to solve this issue without using a different library other than fitz and it feels like it should be a solution within fitz, considering that page.get_text() can access the exact information I'm trying to eliminate

José Chamorro
  • 497
  • 1
  • 6
  • 21
  • 1
    I use the term 'text layer' freely, understanding that pdfs are not meant to create different layers but to fix its info in a printable format. I guess that means once 'embedded', the inserted text can't be easily removed... but it still feels weird that I can access the text, see each character's position, and still not be able to remove it... I'm just an enthusiast anyway – José Chamorro Jul 10 '22 at 02:09

1 Answers1

1

If you are certain of the whereabouts of your text on the page (and I understood that you are), simply use PDF redactions:

page.add_redact_annot(rect1)  # remove text inside this rectangle
page.add_redact_annot(rect2)
...
page.apply_redactions(images=fitz.PDF_REDACT_IMAGE_NONE)
# the above removes everything intersecting any of the rects,
# but leaves images untouched

Obviously you can remove all text on the page by taking page.rect as the redaction rectangle.

Jorj McKie
  • 2,062
  • 1
  • 13
  • 17
  • Is there a way to check if there is nothing to remove (except images) on this page? – Demetry Pascal Feb 27 '23 at 18:47
  • 1
    @DemetryPascal - you can check for text presence like this: `text = page.get_text()`. If `text` is empty or only newlines, etc, then there is no text. Otherwise there may be vector graphics, which you can extract like this: `paths = page.get_drawings()`. Returns a list, so if empty, then there are graphics. That's about it. – Jorj McKie Mar 01 '23 at 04:11
  • 1
    Of course there may just be a bunch of spaces and / or newline characters, so you may want to do `text = text.replace("\n", "").strip()` and **then** check for emptiness. – Jorj McKie Mar 01 '23 at 04:20