13

I need to convert some HTML reports into PDF using Perl. What are the best CPAN modules for the job?

Sinan Ünür
  • 116,958
  • 15
  • 196
  • 339
  • It depends a bit if you need support for CSS. Most of the solutions only support very basic HTML 3.0 or so and little or no CSS. – sventechie Sep 21 '11 at 20:07
  • 2
    Try wkhtmltopdf - http://code.google.com/p/wkhtmltopdf It's a console HTML->PDF converter, it also supports CSS styles. – Vit D Jul 15 '13 at 14:38

6 Answers6

7

I hope PDF::FromHTML may be of help.

szabgab
  • 6,202
  • 11
  • 50
  • 64
Alan Haggai Alavi
  • 72,802
  • 19
  • 102
  • 127
  • is this available for Windows... running Perl on Windows – user2829 Dec 09 '11 at 21:38
  • Yes it runs on Windows as well. Reference: [CPAN Testers](http://www.cpantesters.org/distro/P/PDF-FromHTML.html). – Alan Haggai Alavi Dec 10 '11 at 03:08
  • 1
    No CSS for you! _CAVEATS: [...] This means any HTML using external or inline CSS for design and layout, including but not limited to: images, backgrounds, colours, fonts etc... will not be converted into the PDF._. – Pablo Bianchi Jul 03 '18 at 01:18
3

HTML::HTMLDoc uses the underlying htmldoc C library which is built to do just this. And it's pretty fast too.

szabgab
  • 6,202
  • 11
  • 50
  • 64
mpeters
  • 4,737
  • 3
  • 27
  • 41
  • 1
    As of September 2011, only the development (beta) version 1.9 supports HTML 4.0 and CSS partially. However it does appear well designed and documented. – sventechie Sep 21 '11 at 20:12
3

PDF::WebKit

I could convert from HTML to PDF with Perl with PDF::WebKit, which in turn uses wkhtmltopdf. From apt show wkhtmltopdf:

Command line utilities to convert html to pdf or image using WebKit wkhtmltopdf is a command line program which permits one to create a pdf or an image from an url, a local html file or stdin. It produces a pdf or an image like rendered with the WebKit engine.

This program requires an X11 server to run.

Therefore, this solution seems unacceptable on a server. Maybe WeasyPrint (built with Python) or athenapdf? Or Pandoc?

The latest version is headless (does not require X server).

Installation:

sudo cpanm install PDF::WebKit
sudo apt install xfonts-75dpi
sudo apt install wkhtmltopdf

Use .deb from official site to get the latest version.

html2pdf.pl

#!/usr/bin/perl
use PDF::WebKit;
my $kit = PDF::WebKit->new('/tmp/index.html');
my $file = $kit->to_file('/tmp/my.pdf');

Sample, index.html:

<!DOCTYPE html>
<html lang="en">
<head>
  <title>My Title</title>
  <meta charset="utf-8">
  <meta name="viewport" content="width=device-width, initial-scale=1">
  <link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/4.5.0/css/bootstrap.min.css">
</head>
<body>
  <p class="text-primary">.text-primary</p>
  <p class="text-secondary">.text-secondary</p>
  <p class="text-success">.text-success</p>
  <p class="text-danger">.text-danger</p>
  <p class="text-warning">.text-warning</p>
</body>
</html>

screenshot

Pablo Bianchi
  • 1,824
  • 1
  • 26
  • 30
1

I've used PDF::API2 to create PDF reports with great success.

szabgab
  • 6,202
  • 11
  • 50
  • 64
friedo
  • 65,762
  • 16
  • 114
  • 184
  • 2
    PDF::API2 is very powerful, but it requires pixel-level addresses for layout -- not good for easily formatting text. – sventechie Sep 21 '11 at 20:13
1

It depends on exactly what you need to do, but I'd probably look at Template::Extract and PDF::Template.

Sinan Ünür
  • 116,958
  • 15
  • 196
  • 339
Quentin
  • 914,110
  • 126
  • 1,211
  • 1,335
1

PinceXML

That doesn't answer you question in the sense of using Perl, but as far as I know that is the best HTML to PDF converter available.

Adam Flott
  • 487
  • 3
  • 8
  • I've heard good things about it too. It does document using it from Perl ( http://www.princexml.com/doc/6.0/perl/ ) but that just boils down to "Use STDIN/SDOUT". – Quentin Jul 14 '09 at 12:04
  • PrinceXML appears to support CSS well (one of the authors helped draft the original CSS specification) but it costs quite a lot for commercial use. They also provide a CSS example file for pagination of HTML/XML documents and formatting books. – sventechie Sep 21 '11 at 20:10
  • I work at Expected Behavior, and we've got an HTML to PDF API called DocRaptor that uses Prince as the rendering engine for PDFs. Our service is a good way of getting the quality of Prince without doing a server-side installation. http://docraptor.com – illbzo1 Dec 13 '12 at 20:15
  • 1
    We have been using PrinceXML at my company for a while, and it is absolutely amazing. Sure, it is not free or cheap, but much cheaper than trying to build it all yourself. – Mauritz Hansen Jan 07 '13 at 06:55