0

Is there any clear and proper process to convert a pdf file into a word file with all formatting and images in asp.net web application?

Krishanu Dey
  • 6,326
  • 7
  • 51
  • 69

2 Answers2

1

The best way to do that is by using the OCR. It will recognize the text and the images in the PDF file, and then you can save it on a DOC file. I know a third party toolkit named leadtools that should help you doing your requirements, since it support the ASP.NET environment. You can check their Online OCR Demo Also, you can check their website for more information, or contact their support team.

0

PDF is a presentational format where all the content is placed by absolute positions. There are no paragraphs and other structured elements (unless it is a Tagged PDF). Technically, you can output every word character by character in any order, but visually it would look like a normal text. Thus, to make a proper conversion to word it is required to do content recognition or some kind of OCR (e.g. ABBYY FineReader)

There are some paid components on the market that allow to do text extraction and some do converting pages to images (obviously, this is not a desired approach for converting into word).

Josh Slone
  • 51
  • 2