2

I'm building an application to view pdf's through a browser without the need of a plugin on mobile devices. I tried ImageMagick and ghostscript to covert the pages to images but they are far too large and text becomes unclear. I see website offering a service of converting pdf's into html and do a descent job but I can't find an example of how this is accomplished. Any help is much appreciated. Thanks!

Tw1tCh
  • 89
  • 1
  • 12
  • Old question, but: you could get in touch with the online service(s) you mention, and ask how they do it. Some won't say, but there's no harm in trying. – halfer Apr 24 '12 at 19:11

3 Answers3

1

If you are looking at converting PDF to HTML and planning to run the conversion on a server, then you can try pdf2html. It is a program packaged as part of poppler-utils. I do not know how the program accomplishes it.

FFL
  • 669
  • 7
  • 22
  • thanks for the tip, I have used that before for a different project and it did work, quite well too, although I wish to accomplish this locally on the device itself. – Tw1tCh Aug 08 '12 at 00:34
1

I was googling and came across the below link explaining how scridb.com implements conversion. http://coding.scribd.com/2010/06/01/the-perils-of-stacking/

FFL
  • 669
  • 7
  • 22
1

EDIT: I seem to have read the question backwards. In this case it might be best to parse through the PDF and then format some HTML based on what you find. I believe the javapdf option is capable of this, but I haven't used any of these so I am not sure. If worse comes to worst and you can't find software to disassemble a PDF, you might be able to write your own disassembler in Java or PHP by reading the PDF specification. Best of luck!

http://www.adobe.com/devnet/pdf/pdf_reference.html - PDF Specification (Adobe Modified Version, because they are most popular you may want to support their extensions)

-- OLD -- These websites probably write their own proprietary software to do the trick. If you are truly interested in this undertaking, I would suggest parsing the HTML to get the data and style information and using it to format some sort of PDF writer APIs. A quick Google search yields the following: -- END OLD --

http://www.cutepdf.com/Solutions/

http://ruby-pdf.rubyforge.org/pdf-writer/doc/index.html

http://asprise.com/product/javapdf/

CuddleBunny
  • 1,941
  • 2
  • 23
  • 45
  • I think you may have misunderstood the question, I meant that I want to convert the pdf to a viewable format in a browser without the need of a plugin. From what I gathered in the links you provided all they describe is creating a pdf. Thanks for looking anyway. – Tw1tCh Jun 06 '11 at 15:20
  • You are right, I seem to have read the question backwards. Some of these software may also contain PDF reading capabilities. In which case you do the opposite of what I said previously and read the PDF and format some HTML with the information provided by the reader. I am pretty sure the javapdf option will do the trick. – CuddleBunny Jun 13 '11 at 18:45