0

I currently have a project where HTML code is dynamically generated from spreadsheets, this code is then converted to PDFs.

I need to keep the CSS and Javascript formatting (such as Bootstrap) when I convert the file and also maintain hyperlinks.


I have tried:

  • Wkhtmltopdf through pdfkit in Python which does maintain hyperlinks but fails to keep any of my CSS/JS formatting. With my HTML file, I have tried using external, internal, and in-line CSS as some forums have suggested to no avail.
    This is what the pdfkit code looks like:
import pdfkit
import os

cwd = os.getcwd()

path_wkhtmltopdf = r"C:\Program Files\wkhtmltopdf\bin\wkhtmltopdf.exe"
config = pdfkit.configuration(wkhtmltopdf=path_wkhtmltopdf)

file_name = "index"
file_html = file_name + '.html'
file_pdf = file_name + '.pdf'

source_HTML = os.path.join(cwd, file_html)

pdfkit.from_file(source_HTML, file_pdf, configuration=config, options={"enable-local-file-access": ""})

  • Simply using Microsoft Convert to PDF which does maintain formatting, but fails to include hyperlinks and more importantly isn't automated (necessary for 200+ PDFs at a time).

I have mostly written this project in Python to write and convert the PDFs, if there is a way to achieve my goals of automating, maintaining the formatting, and keeping hyperlinks using packages/libraries from other languages, I am more than willing to try.

I have heard that I could use LaTeX (from the ground up) to accomplish this goal but I'd rather avoid switching from the HTML/CSS/JS framework.

futium
  • 90
  • 1
  • 9
  • @KJ Would I need Acrobat for that? or is there another way of using an Edge headless client for this? – futium Jun 14 '23 at 02:16
  • @KJ I just tried it and although it maintains formatting the page is slightly much larger than desire and offcentered but that I imagine could be tinkered with; however, hyperlinks do not seem to work. – futium Jun 14 '23 at 16:09
  • @KJ so I have two hyperlinks, one is an email, the other is a linkedin page: qq1@gmail.com LinkedIn – futium Jun 14 '23 at 16:26
  • @KJ so when i put this page as the source file, the links save properly but when i use my own local HTML file it puts the innerHTML for the link first, and the href in parentheses after. What could be causing the problem here? – futium Jun 14 '23 at 17:13

1 Answers1

1

So I decided to use iText7 in C# to generate the PDF. Although many QoL features of CSS are missing—see note below—it DOES support formatting and gives me more control. A shame to be forced to handwrite a lot of code already outlined with CSS but at least I can get exactly what I want... just with more effort.

This is the code I used to get iText7 for those who might come across this:

using System;
using System.IO;
using iText.Html2pdf;
using iText.Kernel.Pdf;

string outputPdfFilePath = "path/to/output.pdf";
string htmlFilePath = "path/to/input.html";

PdfWriter writer = new PdfWriter(outputPdfFilePath);
PdfDocument pdfDocument = new PdfDocument(writer);

HtmlConverter.ConvertToPdf(new FileStream(htmlFilePath, FileMode.Open), pdfDocument);

pdfDocument.Close();

*To save future headaches iText7...

  • Does not support CSS calc function e.g. calc(1 * 0.2125in)
  • Does not support CSS variables e.g. var(--some-var)
  • Does not support Gridboxes
  • Requires you to change the default margins like the following:
@page {
    margin: 0.4250in;
}

Hope this helps others.

futium
  • 90
  • 1
  • 9