-2

I'm looking for either a code snippet or other solution capable of converting a high volume (thousands) of .pdf's into .html or .doc while at the same time:

  • maintaining hierarchical structure of headings
  • capturing images in the document, uploading them to an image server and creating an absolute link to it, and maintaining table formatting.

Does such a tool exist and if so, who makes it? If not, who are some of the thought leaders in the space that I can connect with?

Cognitivity
  • 31
  • 1
  • 3

1 Answers1

0

Check pdftohtml

You can then add some scripting around it to do a batch conversion.

The results aren’t that great, though.

TRiG
  • 10,148
  • 7
  • 57
  • 107
xvan
  • 4,554
  • 1
  • 22
  • 37