1

I am working on a feasibility for having an application that can capture text from PDF. The simple use-case can be briefed as:

  1. User selects text on PDF document (using Acrobat reader / or other PDF reader)
  2. The selection completed event should be available to the .NET application that is observing.
  3. Upon selection, the user can select, state some further properties (like category/level) and the same information is tagged along with the selected text inside the PDF file itself.
  4. The selected text should be retained with highlighted color. The color will be different depending upon other parameters (like category/level) selected in the .NET application.
  5. A separate application should be able to parse and gather these data from the PDF file.

Similar application is already working with MS Word files.

Edit:

The basic requirement is that there should be some way to notify the .NET application when user selects some text in the PDF document. The other requirement is that there should be a way to add a tag to the selected document.

Can somebody suggest some API/resource for such implementations?

Kangkan
  • 15,267
  • 10
  • 70
  • 113
  • have you tried any google searches for this.. what kind of PDF .net tools are you using..? sounds like you will have to write your own custom PDF watcher classes to capture individual document that are being parsed and or edited.. also how are you going to distinguish between a saved pdf and one that is edited and then canceled..? I would personally look into doing something along the lines of OCR – MethodMan Dec 26 '11 at 09:58
  • I am currently looking through the documentations of iText. But not found any definitive information. I am searching google for this. This is a query to gather some first-hand information or experience from the fellow members here. – Kangkan Dec 26 '11 at 10:06
  • Oh I totally understand that.. I was just wondering..somethings are easier than others to compose and or create in terms of code.. I bet it can be done.. but it would probably take looking at the way that MS does it with a word doc .. and just playing around with pdf objects in it's place.. I won't say this will be easy.. but I had to do that once when I coded Delphi creating PDF from QuickReports.. I accomplished my task by looking thru every single Class Lib that the 3rd party had.. and found what I needed eventually without having to override any of their Base – MethodMan Dec 26 '11 at 10:11
  • Now that I think about it.. I did something close to what you are doing using that ActiveReports and the .net version.. but I was using it along with a web application – MethodMan Dec 26 '11 at 10:13
  • PDF creation is quite a simple task now with hosts of opensource as well as other available libraries. But what I am looking at having a library that enables me to capture event on the PDF document. This is in the same line the way one can use office interop assemblies for MS Office. – Kangkan Dec 27 '11 at 09:03
  • gotcha.. I will look around and see what I can find I have some pdf guru's here at work as well.. may take a while since it's holiday season.. – MethodMan Dec 27 '11 at 14:01
  • @DJKRAZE: I shall await your response! – Kangkan Dec 28 '11 at 09:04
  • have not found anything without the use of a 3rd party tool. – MethodMan Dec 28 '11 at 14:19
  • @DJKRAZE: I am starting with Acrobat SDK. – Kangkan Jan 04 '12 at 06:40
  • I have had no luck yet I was thinking that ItextSharp might be able to handle what you are looking for but I am still searching in my spare time.. – MethodMan Jan 04 '12 at 14:18

1 Answers1

1

Take a look on Amyuni PDF Creator .Net:

Usual disclaimer applies

Community
  • 1
  • 1
yms
  • 10,361
  • 3
  • 38
  • 68