I need to convert PDF files to HTML.
I can do this manually via several steps, using this (Rube) Goldberg variation:
0) Save PDF as text
1) Copy-and-paste text into MS Word
2) Save MS Word doc as HTML
I feel like I'm walking on my hands doing that, though.
Is there a programmatic way to accomplish the same? So that I could do something like:
string htmlFile = ConvertPDFToHTML("FrumiousBandersnatch.PDF");
` tags added, you don't need that step. Adding `
`s around lines may be simpler with a utility such as `sed`. You could try [`pdftotext`](http://en.wikipedia.org/wiki/Pdftotext) for the first step.
– Jongware Feb 13 '14 at 21:24