I'm building an application to view pdf's through a browser without the need of a plugin on mobile devices. I tried ImageMagick and ghostscript to covert the pages to images but they are far too large and text becomes unclear. I see website offering a service of converting pdf's into html and do a descent job but I can't find an example of how this is accomplished. Any help is much appreciated. Thanks!
-
Old question, but: you could get in touch with the online service(s) you mention, and ask how they do it. Some won't say, but there's no harm in trying. – halfer Apr 24 '12 at 19:11
3 Answers
If you are looking at converting PDF to HTML and planning to run the conversion on a server, then you can try pdf2html. It is a program packaged as part of poppler-utils. I do not know how the program accomplishes it.

- 669
- 7
- 22
-
thanks for the tip, I have used that before for a different project and it did work, quite well too, although I wish to accomplish this locally on the device itself. – Tw1tCh Aug 08 '12 at 00:34
I was googling and came across the below link explaining how scridb.com implements conversion. http://coding.scribd.com/2010/06/01/the-perils-of-stacking/

- 669
- 7
- 22
EDIT: I seem to have read the question backwards. In this case it might be best to parse through the PDF and then format some HTML based on what you find. I believe the javapdf option is capable of this, but I haven't used any of these so I am not sure. If worse comes to worst and you can't find software to disassemble a PDF, you might be able to write your own disassembler in Java or PHP by reading the PDF specification. Best of luck!
http://www.adobe.com/devnet/pdf/pdf_reference.html - PDF Specification (Adobe Modified Version, because they are most popular you may want to support their extensions)
-- OLD -- These websites probably write their own proprietary software to do the trick. If you are truly interested in this undertaking, I would suggest parsing the HTML to get the data and style information and using it to format some sort of PDF writer APIs. A quick Google search yields the following: -- END OLD --
http://www.cutepdf.com/Solutions/

- 1,941
- 2
- 23
- 45
-
I think you may have misunderstood the question, I meant that I want to convert the pdf to a viewable format in a browser without the need of a plugin. From what I gathered in the links you provided all they describe is creating a pdf. Thanks for looking anyway. – Tw1tCh Jun 06 '11 at 15:20
-
You are right, I seem to have read the question backwards. Some of these software may also contain PDF reading capabilities. In which case you do the opposite of what I said previously and read the PDF and format some HTML with the information provided by the reader. I am pretty sure the javapdf option will do the trick. – CuddleBunny Jun 13 '11 at 18:45