1

I am developing an infrastructure where developers can document their verification tests using Jupyter notebooks. One part of the infrastructure will be a python script that can convert their .ipynb files to .html files to provide public-facing documentation of their tests.

Using the nbconvert module does most of what I want, but I would like to allow citations and references in the final HTML file. I can use pypandoc to generate HTML text that converts the citations to proper inlined syntax and adds a References section:

from urllib import urlopen
import nbformat
import pypandoc
from nbconvert import MarkdownExporter

response = urlopen('SimpleExample.ipynb').read().decode()
notebook = nbformat.reads(response, as_version=4)
exporter = MarkdownExporter()
(body, resources) = exporter.from_notebook_node(notebook)

filters = ['pandoc-citeproc']
extra_args = ['--bibliography="ref.bib"',
              '--reference-links',
              '--csl=MWR.csl']
new_body = pypandoc.convert_text(body,
                                 'html',
                                 'md',
                                 filters=filters,
                                 extra_args=extra_args)

The problem is that this generated HTML loses all of the considerable formatting and other capabilities provided by nbconvert.HTMLExporter.

My question is, is there a straightforward way to merge the results of nbconvert.HTMLExporter and pypandoc.convert_text() such that I get mostly the former, with inline citations and a Reference section added from the latter?

Bill Spotz
  • 33
  • 5

1 Answers1

1

I don't know that this necessarily counts as "straightforward" but I was able to come up with a solution. It involves writing a class that inherits from nbconvert.preprocessors.Preprocessor and implements the preprocess(self, nb, resources) method. Here is what preprocess() does:

  1. Loop over every cell in the notebook and store a set of citation keys (these are of the form [@bibtex_key]
  2. Create a short body of text consisting of only these citation keys, each separated by '\n\n'
  3. Use the pandoc conversion above to generate HTML text from this short body of text. If num_cite is the number of citations, the first num_cite lines of the generated text will be the inline versions of the citations (e.g. '(Author, Year)'); the remaining lines will be the content of the references section.
  4. Go back through each cell and substitute the inline text of each citation for its key.
  5. Add a cell to the notebook with ## References
  6. Add a cell to the notebook with the content of the references section

Now, when an HTMLExporter, using this Preprocessor, converts a notebook, the results will have inline citations, a reference section, and all of the formatting you expect from the HTMLExporter.

Bill Spotz
  • 33
  • 5