5

I need to extract text from PDF file. I've found iTextSharp and PDFBox, but both of them are only Java ports and to make them work i need to use big additional dlls.

So, my question is: is there some native C# library for extracting text from PDF files? If there is no any, is it hard to write one?

xZ6a33YaYEfmv
  • 1,816
  • 4
  • 24
  • 43
  • 2
    "If there is no any, is it hard to write it?" If it wasn't hard, someone would've written one already. – BoltClock Apr 16 '11 at 22:38
  • possible duplicate of [PDF Reader](http://stackoverflow.com/questions/905683/pdf-reader) – Hans Passant Apr 16 '11 at 22:39
  • No, nothing native, and yes, it's very difficult. – Bob G Apr 16 '11 at 22:56
  • 1
    If iTextSharp does not fill your needs, then you will probably need to go with a commercial (paid) product. And yes, iTextSharp is a port from Java, but it was rewritten in c#, thus managed code. – Jim Apr 17 '11 at 00:26
  • 2
    @Jim iTextSharp/iText are also paid products unless used in open source projects. – Bobrovsky Apr 17 '11 at 06:30

2 Answers2

3

Docotic.Pdf library may be used to extract text from PDF files.

The library has no external dependencies and is written in C#. Docotic.Pdf comes in four editions.

Disclaimer: I work for Bit Miracle.

Bobrovsky
  • 13,789
  • 19
  • 80
  • 130
3

There's PdfSharp

erikkallen
  • 33,800
  • 13
  • 85
  • 120