3

We have a proprietary on-line book/manual generation system that creates compiled help files (CHM) as well as HTML versions of content from a database.

This works very well, and we've been using the system without significant modifications for several years.

Now, we'd also like to generate PDFs from the same source. I've looked at a few CHM-to-PDF and HTML-to-PDF converters, but I haven't been able to find one that handles hyperlinks correctly and, despite the number of times similar questions have been asked here, it doesn't seem that anyone's found a good solution (or at least they haven't bothered to post any information about it).

Any suggestions? I'd hate to have to write a PDF generator when so much work has already been done in that area.

3Dave
  • 28,657
  • 18
  • 88
  • 151

2 Answers2

2

Usually you generate all of these from an abstract source. HTML as a standard is so expansive, it is very,very hard to create a generic HTML to something converter to something that isn't html, the input could be just about everything. And CHM seems to support HTML pretty much as far as MSIE does (read: a lot)

So probably the smartest thing is to determine the HTML subset you use to implement the base format, and try to generate something from that. Consider trying to simplify the html by editing (mass-replace) and scripts, and then extract the bulk of the content with minimal formatting (and in some abstract form)

A converter for a general case html->something is very hard or unoptimal (think putting pictures of rendered content in PDF), so you are always talking about subsets.

Marco van de Voort
  • 25,628
  • 5
  • 56
  • 89
  • The "abstract source" in this case is in a subset of HTML stored in SQL Server. The existing converters handle it very well - the only issue is links that aren't converted (the best I've got seen is underlining of links). I'm about to the point of finding a PDF toolkit and implementing it myself, which seems like something that should be completely unnecessary. Argh. – 3Dave Oct 18 '11 at 13:39
1

I am part owner in a business that converts html to pdf: Docraptor.

Here is an sample that I believe demonstrates "correct" handling of hyperlinks. That is, the external link is sent to the default web browser, and the hash link jumps to the relevant place in the PDF where “Test!” appears. You can check out the PDF output of this sample here.

<html>
  <head>
    <style type="text/css">
      hr {page-break-after:always;}
    </style>
  <body>
    <a href="http://www.google.com">Google</a>
    <a href="#test">Test?</a>
    <hr />
    <h1 id="test">Test!</h1>
  </body>
</html>

The hr style in this sample is just so I could have two pages of text so the link would demonstrate functionality.

Joel Meador
  • 2,586
  • 2
  • 19
  • 24
  • 1
    Looks promising- oddly enough I just saw your answer, and the project is still in the backlog. Thanks! – 3Dave Jan 07 '13 at 08:20