I need to delete a text from a PDF document. I am using Aspose for the purpose
am currently using TextFragmentAbsorber
.
FYI, I cannot use any other 3rd party library.
Below is the code I am using :
private string DeleteMachineReadableCode(string inputFilePath)
{
var outputFilePath = Path.Combine(Path.GetTempPath(), string.Format(@"{0}.pdf", Guid.NewGuid()));
try
{
// Open document
Document pdfDocument = new Document(inputFilePath);
// Create TextAbsorber object to find all the phrases matching the regular expression
TextFragmentAbsorber textFragmentAbsorber = new TextFragmentAbsorber("#START#((.|\r\n)*?)#END#");
// Set text search option to specify regular expression usage
TextSearchOptions textSearchOptions = new TextSearchOptions(true);
textFragmentAbsorber.TextSearchOptions = textSearchOptions;
// Accept the absorber for all pages
pdfDocument.Pages.Accept(textFragmentAbsorber);
// Get the extracted text fragments
TextFragmentCollection textFragmentCollection = textFragmentAbsorber.TextFragments;
// Loop through the fragments
foreach (TextFragment textFragment in textFragmentCollection)
{
// Update text and other properties
textFragment.Text = string.Empty;
// Set to an instance of an object.
textFragment.TextState.Font = FontRepository.FindFont("Verdana");
textFragment.TextState.FontSize = 1;
textFragment.TextState.ForegroundColor = Aspose.Pdf.Color.FromRgb(System.Drawing.Color.White);
textFragment.TextState.BackgroundColor = Aspose.Pdf.Color.FromRgb(System.Drawing.Color.White);
}
pdfDocument.Save(outputFilePath);
}
finally
{
if (File.Exists(inputFilePath))
File.Delete(inputFilePath);
}
return outputFilePath;
}
I am able to replace the content if the content to be deleted is on a single page.
My problem is that if the text spans over multiple pages the TextFragmentAbsorber does not recognize the text with the mentioned regex pattern ("#START#((.|\r\n)*?)#END#
").
Please suggest if anything can be done on the regex or the some setting in Aspose can fix my issue.