1

There are lots of answers about how to output a page break to .docx files with Pandoc, but is there any way to detect page breaks when reading from a .docx?

I know Pandoc's AST doesn't support the idea of a page break, but I'd been hoping to be able to use eg a Lua filter with a RawBlock:

function RawBlock (el)
  return pandoc.Str "PAGE BREAK"
end

return {
  {RawBlock = RawBlock}
}

However, that doesn't work (presumably because the page break is simply ignored, rather than turned into a RawBlock?)

The only solution I can think of is to pre-process the .docx using an XML parser and replace all instances of <w:br w:type="page"/> with a magic string, which we can then detect, but using a separate XML parser sort of defeats the point of using Pandoc in the first place.

Alec
  • 2,432
  • 2
  • 18
  • 28
  • try the newest pandoc version and do `pandoc input.docx -t native` to see all the things you can match on with a filter. If the pagebreak is not there, you'll indeed have to preprocess it... – mb21 Oct 27 '20 at 16:03
  • Yeah sadly nothing in there. – Alec Oct 27 '20 at 16:15
  • 1
    Yeah, ultimately pandoc will have to add a pagebreak element to its AST, see https://github.com/jgm/pandoc/issues/1934 – mb21 Oct 28 '20 at 08:12

0 Answers0