8

I'm trying out Pandoc on OS X, and results thus far are impressive. One blocking problem, however, is getting CSS styles to work on inline code samples. I'm converting from Markdown to PDF.

I have this string in my source:

* Create a simple HTML document (<span class="filename">simple.html</span>) and load it into the browser via the file system

I've also tried this:

* Create a simple HTML document (`simple.html`{.filename}) and load it into the browser via the file system

I'd like to apply the class "filename" to the enclosed text in each case, but it doesn't seem to do anything to the output. However the manual says:

Some output formats can use this information to do syntax highlighting. Currently, the only output formats that uses this information are HTML and LaTeX.

Here's my command:

pandoc \
    --output ./output.pdf \
    --css source/styles.css \
    source/en/docs/input.md

I'm converting to PDF, which is written by LaTeX by Pandoc internally. Can I get this to work? Or, can I use a style defined using a LaTeX command? - it doesn't have to be CSS. However, it must be a style system - it's not workable to change italic/font/colour attributes on each occasion.

I've tried sending output temporarily to HTML, and in that situation the styles are imported directly from the specific style asset. So, my stylesheet specification and span markup is correct, at least for one output format.

Addenda

A couple of afterthoughts:

  • The solution does not have to be Pandoc or Markdown. However, it does need to be a trivial text-based markup language that can convert reliably to PDF, as I want to store the document files on Git for easy forking and merging. I'm not keen on HTML as it is verbose, and engines to convert it aren't that great (though, admittedly, my formatting requirements are modest).
  • The HTML output from Pandoc is fine, so if I can find something that converts the (simple) HTML/CSS to PDF reliably, I'll be fine. Of course, Pandoc should be able to do this, but inline styles (for the background colour on code fragments) aren't rendered. This might be a faff, as I'll have to reintroduce things like page-breaks, which can be non-trivial in HTML-to-PDF converters.
halfer
  • 19,824
  • 17
  • 99
  • 186
  • 3
    btw, pandoc can use `wkhtmltopdf` if installed now: `pandoc -t html5 --css mystyles.css input.md -o output.pdf` – mb21 Dec 05 '16 at 14:11
  • Thanks @mb21, appreciated. I take it that styles are preserved from Markdown to PDF? I have forgotten the details of this problem, but from reading the above it looks like HTML to PDF conversion in Pandoc used to go via LaTeX, which stripped out class information. – halfer Dec 05 '16 at 16:38
  • for pdf generation with pandoc, you can now use either latex (the default) or wkhtmltopdf (which will indeed receive the classes/styles on spans and divs). – mb21 Dec 05 '16 at 21:41
  • @mb21: lovely, good to hear! If I need this again I will try that configuration. – halfer Dec 05 '16 at 22:01
  • 1
    Related: [How to convert Markdown + CSS -> PDF?](https://stackoverflow.com/questions/23825317/how-to-convert-markdown-css-pdf) – Gabriel Staples Oct 08 '20 at 07:26
  • @GabrielStaples is there a way to add or fix css from html to pdf conversion using pandoc? What I observe is Pandoc pdf converted documents are not clean when it comes to the exact html to pdf conversion – Mahesh Mesta Jul 22 '21 at 13:40

2 Answers2

2

"I'd like to apply the class "filename" to the enclosed text in each case, but it doesn't seem to do anything to the output."

It works for HTML. Running Pandoc interactively, ^D to see the resulting code:

$>  pandoc -f markdown -t html

* Create a simple HTML document (`simple.html`{.filename}) and load it.

^D

<ul>
<li>Create a simple HTML document (<code class="filename">simple.html</code>) and load it.</li>
</ul>

It doesn't work for LaTeX if you use the .filename class. You need to use one of the known classnames:

$>  pandoc -f markdown -t latex

* Create a simple HTML document (`simple.html`{.filename}) and load it.

^D

\begin{itemize}
\tightlist
\item
  Create a simple HTML document (\texttt{simple.html}) and load it.
\end{itemize}

Now using one of the known classnames, like .bash, .postscript, .php, ...:

$>  pandoc -f markdown -t latex

* Create a simple HTML document (`simple.html`{.bash}) and load it.

^D

\begin{itemize}
\tightlist
\item
  Create a simple HTML document (\VERB|\KeywordTok{simple.html}| and
  load it.
\end{itemize}

To convert HTML + CSS into PDF, you can also look into PrinceXML, which is free for non-commercial use.

Kurt Pfeifle
  • 86,724
  • 23
  • 248
  • 345
  • Hi Kurt - I think I meant to come back to your answer here with thanks and comment - apologies that I didn't do so at the time. I am no longer in immediate need of this - though I may be in the future, so it'll be useful to refer to. I'd considered PrinceXML, but the free version places a watermark on the documents, and I didn't want this. – halfer Aug 27 '15 at 16:50
  • @halfer: the only free utility which *may* be able to handle HTML with CSS sufficiently well when converting to PDF is `wkhtmltopdf`. But I'm not sure how well it really works with your CSS. PrinceXML rocks, and I don't mind the un-obtrusive watermark on the first page. If you don't want it, pay :-) – Kurt Pfeifle Aug 27 '15 at 17:38
  • 2
    I've used Prince commercially - it is excellent, though IMO very expensive. However this was for a free project, so payment was not an option. But, I understand the watermark/payment dilemma in general `:-)`. – halfer Aug 27 '15 at 17:40
  • I can't remember if I tried `wkhtmltopdf` for the use case that prompted this question. It was a potential rival to Prince when I used it commercially, but `wkhtmltopdf` didn't handle multi-page tables very well. Prince was much better in this regard. – halfer Aug 27 '15 at 17:43
  • 2
    Biases as a developer for them, but https://DocRaptor.com is a hosted provider of PrinceXML that provides a different pricing and delivery model. `wkhtmltopdf` is the best OS system we've played with though. – jamespaden Aug 28 '15 at 12:42
1

I don't know LaTeX at all, but have hacked this solution using this helpful manual. First, create a style:

\definecolor{silver}{RGB}{230,230,230}

\newcommand{\inlinecodeblock}[1]{
    \colorbox{silver}{
        \texttt{#1}
    }
}

And here's how to use it:

Some \inlinecodeblock{inline code}, and some widgets, go here

This creates a style with a background colour and a monospaced font. The margin and padding are a bit large for my preferences, but it's a very useable start. Here's what it looks like:

Screenshot of inline formatting

The disadvantage is that if I wish to output to a format that supports styles proper (such as HTML) then these are lost. Also, my solution only works with LaTeX/PDF. Thus, if you can fix these issues, please add a better answer!


Addendum: I have a better approach, which is thus:

\newcommand{\inlinecodeblock}[1]{
    \fboxsep 1pt
    \fboxrule 0pt
    \colorbox{silver}{\strut{\texttt{#1}}}
}

This avoids the problem of excess horizontal padding - I think it was the line break in the colorbox parameter that did it. I've added in strut, which keeps highlights the same height regardless of whether the text has descenders.

It's not perfect though - there's still too much horizontal margin outside the box, and a comma after a box will still orphan onto the next line. I may give up with LaTeX, and render to HTML from Pandoc, and then use wkhtmltopdf to render the final document.

halfer
  • 19,824
  • 17
  • 99
  • 186
  • 1
    LaTeX considers linebreaks as whitespace unless escaped by a comment, i.e. when in doubt, put a `%` at the end of your lines and your exxtrace spaces should go away. – Ulrich Schwarz Jul 29 '13 at 09:36
  • Thanks @Ulrich, this was fixed by swapping the indented approach in my first example to a single line. So that's one LaTeX problem down, eight more to go... at this rate I may stick with Open Office `;-)`. – halfer Jul 29 '13 at 15:20
  • Incidentally @Ulrich, if you know of a simple way to convert a mergeable text format like Markdown to PDF, whilst retaining widow/orphan control, keep-with-next-paragraph, inline styles, block styles, etc then I would much appreciate your suggestions. I expect this can be done in LaTeX, but I don't have the patience to learn a new language, especially given the same can be done instantly in Open Office. I'd like to ask a new question on Stack Overflow, but it would unfortunately be o/t here. – halfer Jul 29 '13 at 15:50
  • 1
    +1 for ".. I don't know Latex, but..." and still getting a "solved" check. – IRTFM Aug 24 '14 at 00:13
  • Thanks @BondedDust - though it was a self-answer `:-)`. – halfer Aug 26 '14 at 12:12
  • (Aside: the downvote just now appears to be someone who is downvoting unrelated answers of mine, and is not a reflection upon the content of this answer at all). – halfer Aug 26 '14 at 12:12