There are lots of answers about how to output a page break to .docx
files with Pandoc, but is there any way to detect page breaks when reading from a .docx
?
I know Pandoc's AST doesn't support the idea of a page break, but I'd been hoping to be able to use eg a Lua filter with a RawBlock
:
function RawBlock (el)
return pandoc.Str "PAGE BREAK"
end
return {
{RawBlock = RawBlock}
}
However, that doesn't work (presumably because the page break is simply ignored, rather than turned into a RawBlock
?)
The only solution I can think of is to pre-process the .docx
using an XML parser and replace all instances of <w:br w:type="page"/>
with a magic string, which we can then detect, but using a separate XML parser sort of defeats the point of using Pandoc in the first place.