-1

How can I find and then hide (or delete) specific text phrase?

For example, I have created a PDF file containing all sorts of data such as images, tables, text etc.

Now, I want to find a specific phrase like "Hello World" wherever it is mentioned in the file and somehow hide it, or -better even- delete it from the PDF.

And finally get the PDF after deleting this phrase.

I have tried iTextSharp and Spire, but couldn't find anything that worked.

Joris Schellekens
  • 8,483
  • 2
  • 23
  • 54
  • @BrunoLowagie I wouldn't call it false, that's a bit harsh. I'd rather say incomplete. A factually correct statement could have been: _"I tried with an older version of iText, expecting that it would contain the functionality that was introduced in a recent version, and I was unable to get it to work."_ – Amedee Van Gasse May 16 '18 at 08:12
  • The version of iText wasn't mentioned in the post (which is another flaw of the question). However, since the OP talks about iTextSharp instead of about iText for .NET, we *could* assume that the problem is indeed caused by using an old version of iText. I didn't because the OP insinuates that he did a search and couldn't find anything. One would expect that such a search (e.g. on the iText web site) would result in a solution that works, such as the iText add-on pdfSweep. – Bruno Lowagie May 16 '18 at 08:17
  • Hi @David, please remove the links to questions about *extracting* text; they are irrelevant as an answer to a question about *redacting* text. Extracting text is getting text from a PDF without changing that PDF; redacting text is removing text from a PDF by altering the syntax of that PDF. – Bruno Lowagie May 16 '18 at 08:19
  • Have you tried [PDFSharp](http://www.pdfsharp.net/)? – Abbas May 16 '18 at 09:26

2 Answers2

2

Try the following code snippets to hide the specifc text phrase on PDF using Spire.PDF.

using Spire.Pdf;
using Spire.Pdf.General.Find;
using System.Drawing;

namespace HideText
{
    class Program
    {
        static void Main(string[] args)
        {
            //load PDF file
            PdfDocument doc = new PdfDocument();
            doc.LoadFromFile(@"C:\Users\Administrator\Desktop\Example.pdf");

            //find all results where "Hello World" appears
            PdfTextFind[] finds = null;
            foreach (PdfPageBase page in doc.Pages)
            {
                finds = page.FindText("Hello World").Finds;               
            }

            //cover the specific result with white background color
            finds[0].ApplyRecoverString("", Color.White, false);

            //save to file
            doc.SaveToFile("output.pdf");
        }
    }
}

Result enter image description here

vaalex
  • 109
  • 3
0

The following snippet from here let you find and black-out the text in pdf document:

PdfDocument pdf = new PdfDocument(new PdfReader(SRC), new PdfWriter(DEST));
ICleanupStrategy cleanupStrategy = new RegexBasedCleanupStrategy(new Regex(@"Alice", RegexOptions.IgnoreCase)).SetRedactionColor(ColorConstants.PINK);
PdfAutoSweep autoSweep = new PdfAutoSweep(cleanupStrategy);
autoSweep.CleanUp(pdf);
pdf.Close();

Pay attention to the license. It is AGPL, if you don't buy license.

astef
  • 8,575
  • 4
  • 56
  • 95