25

I want to convert the following *.md converted into proper LaTeX *.tex.

Lorem *ipsum* something.
Does anyone know lorem by heart?

That would *sad* because there's always Google.

Expected Behavior / Resulting LaTeX from Pandoc

Lorem \emph{ipsum} something.
Does anyone know lorem by heart?

That would \emph{sad} because there's always Google.

Observed Behavior / Resulting LaTeX from Pandoc

Lorem \emph{ipsum} something. Does anyone know lorem by heart?

That would \emph{sad} because there's always Google.

Why do I care? 1. I'm transitioning a bigger git repo from markdown to LaTeX, and I want a clean diff and history. 2. I actually like my LaTeX with one sentence-per-line even though it does not matter for the typesetting.

How can I get Pandoc to do this?

Ps.: I am aware of the option hard_line_breaks, but that only adds \\ between the two first lines, and does not actually preserve my line breaks.

maxheld
  • 3,963
  • 2
  • 32
  • 51

3 Answers3

24

Update

Since pandoc 1.16, this is possible:

pandoc --wrap=preserve

Old answer

Since Pandoc converts the Markdown to an AST-like internal representation, your non-semantic linebreaks are lost. So what you're looking for is not possible without some custom scripting (like using --no-wrap and then processing the output by inserting a line-break wherever there is a dot followed by a space).

However, you can use the --columns NUMBER options to specify the number of characters on each line. So you won't have a sentence per line, but NUMBER of characters per line.

mb21
  • 34,845
  • 8
  • 116
  • 142
  • thanks a bunch – I see that that makes sense in terms of what Pandoc is supposed to do. I ended up having to manually reformat it; last time I'm changing markup mid-project :) – maxheld Nov 06 '14 at 09:09
  • 4
    For some reason, this isn't working for me. Any ideas? – Michael Mar 03 '18 at 20:56
  • Thanks, this is exactly what I was searching for! – v01pe Sep 02 '20 at 22:11
9

A much simpler solution would be to add two spaces after "...something.". This will add a manual line break (the method is mentioned in the Pandoc Manual).

René
  • 91
  • 1
  • 1
  • Thanks @René I understand that possibility. The point here was to figure out a way to convert the `*.md` into `*.tex*` with minimal diff pollution. I guess adding two spaces might have added such spurious diffs. Anyway, I understand what @mb21 wrote earlier – linebreaks in tex are non-semantic, so they *must be lost* on Pandoc conversion. I was just *using it wrong*. – maxheld Jul 11 '15 at 09:40
  • This should be marked as the correct answer since it provides a solution and it is the one with the least overhead. – moestly Apr 27 '20 at 12:49
2

I figured out another way to address this problem – which is to not change the original *.mds (under version control), but to simply read them in and to have them "pandoced" when building the PDF.

Here's how:

Some markdown.md in project root:

Happy one-sentence-per-line **markdown** stuff.
And another line – makes for clear git diffs!

And some latexify.tex in project root:

\documentclass{article}
\begin{document}

\immediate\write18{pandoc markdown.md -t latex -o tmp.tex}
\input{tmp.tex}

\end{document}

Works just dandy if you have some markdown components in a latex project, e.g. github READMEs or sth.

Requires no special package, but compilation with shell-escape enabled.

maxheld
  • 3,963
  • 2
  • 32
  • 51