Running out of memory on ReadTheDocs

Question

I have a project on ReadTheDocs.

As part of the documentation generation, I have Sphinx compile a large number of images using matplotlib's plot directive showing what various commands do. It seems this takes more memory than RTD allocates for a build process. I'm trying to figure out what to do about this.

Thoughts I have:

I could pay RTD to increase my memory limit. But I am a small developer working on an as-yet boutique analysis tool, and their plan is pricey.
I could switch to a smaller dataset for my figure generation and hope this uses less memory. This kind of guess-and-check strategy is frustrating and may not be sustainable anyway, if the number of images increases or the computational complexity increases.
I could commit statically generated images to the existing repo and hack together an extension that generates new images only if the static image is not already present. But I do not like this because now my code repo will grow every time the images need to be changed for some reason, and I prefer to keep the repo light-weight.
I could commit the compiled documentation to a separate repo of some sort and upload that to RTD. This prevents the code repo from growing every time an image changes. However, I'm not sure how to tell RTD about this documentation.

What is a good way to include computationally-expensive auto-generated images in a ReadTheDocs project?

score 1 · Answer 1 · answered Jan 03 '18 at 05:48

At face-value, option 3 is the best approach. If generating the images is computationally expensive, obviously you want to reduce those computations. Further you shouldn't be storing the images. It sounds like what you want to push that logic to the deploy provider. Keep in mind that the images can be cached on the user's computer as well, so there's no need to regenerate non-changed images anyway.

Now, another option would be to use a JavaScript library like plot.ly. Is generating the images or the plots computationally expensive? If generating the plots is cheap, then switching to a JavaScript library is the way to go.

Regarding option 4: how to do that is in the documentation.

score 1 · Answer 2 · answered Jan 03 '18 at 19:01

I ended up going with Option 4.

To do so, I modified the matplotlib's Sphinx plot directive in the following ways.

I added an option so that the user can specify the output names of images. This eliminates the ambiguity as to which image is associated with which code chunk.
I added a configuration option which will place copies of the named output images in a separate directory where they can be version controlled. Images in this directory are copied into the build output prior to running the user's figure generation code; this pre-empts the need to run the code, reducing computation time.

I then modified my Sphinx conf.py file to load and use this new plotting module.

Finally, I saved the resulting imagery in a submodule.

In order to update documentation, I now use the following workflow:

Run make html locally.
Commit changes to the imagery submodule and push it.
Commit changes on the primary repo and push it. This triggers RTD to rebuild.
RTD automagically loads the submodule, therefore acquiring the computationally expensive imagery and runs make html on their build server. However, with the imagery present, no intensive computation is done.

Modified conf.py

#This line tells Sphinx to look for modules in the directory
#containing `conf.py`. This way it finds `plot_directive.py`
sys.path.append(os.path.abspath('.'))

#This must come before plot_directive is loaded by Sphinx
plot_preserve_dir = 'imagery-submodule-directory'

extensions = [
  #...
  'plot_directive',
  #...
]

My modified version of plot_directive.py is available here.

Running out of memory on ReadTheDocs

2 Answers2