4

I am developing a pdf reader. i want to find any string in pdf and to know the corresponding page number. I am using iTextSharp.

BenMorel
  • 34,448
  • 50
  • 182
  • 322
Md Kamruzzaman Sarker
  • 2,387
  • 3
  • 22
  • 38
  • You'll need to extract text from every page, check out PdfTextExtractor, http://stackoverflow.com/a/4893285/231316 – Chris Haas Apr 22 '12 at 15:13

2 Answers2

1

Something like this should work:

// add any string you want to match on
Regex regex = new Regex("the", 
  RegexOptions.IgnoreCase | RegexOptions.Compiled 
);
PdfReader reader = new PdfReader(pdfPath);
PdfReaderContentParser parser = new PdfReaderContentParser(reader);
for (int i = 1; i <= reader.NumberOfPages; i++) {
  ITextExtractionStrategy strategy = parser.ProcessContent(
    i, new SimpleTextExtractionStrategy()
  );
  if ( regex.IsMatch(strategy.GetResultantText()) ) {
    // do whatever with corresponding page number i...
  }
}
kuujinbo
  • 9,272
  • 3
  • 44
  • 57
1

In order to use Itextsharp you can use Acrobat.dll to find the current page number. First of all open the pdf file and search the string usingL

Acroavdoc.open("Filepath","Temperory title") 

and

Acroavdoc.FindText("String").

If the string found in this pdf file then the cursor moved into the particular page and the searched string will be highlighted. Now we use Acroavpageview.GetPageNum() to get the current page number.

Dim AcroXAVDoc As CAcroAVDoc
Dim Acroavpage As AcroAVPageView
Dim AcroXApp As CAcroApp

AcroXAVDoc = CType(CreateObject("AcroExch.AVDoc"), Acrobat.CAcroAVDoc)
AcroXApp = CType(CreateObject("AcroExch.App"), Acrobat.CAcroApp)
AcroXAVDoc.Open(TextBox1.Text, "Original document")
AcroXAVDoc.FindText("String is to searched", True, True, False)
Acroavpage = AcroXAVDoc.GetAVPageView()

Dim x As Integer = Acroavpage.GetPageNum
MsgBox("the string found in page number" & x) 
venkatesh
  • 11
  • 1