2

We have an application creating large pptx's with over 1000 slides and we are using python-pptx library.

The problem we have is that, as the Presentation grows it becomes slower to add Elements and/or charts to it.

from pptx import Presentation
from pptx.chart.data import CategoryChartData
from pptx.enum.chart import XL_CHART_TYPE
from pptx.util import Inches


SLD_LAYOUT_TITLE_AND_CONTENT = 1

prs = Presentation()

slide_layout = prs.slide_layouts[SLD_LAYOUT_TITLE_AND_CONTENT]
for idx in range(2000):
    slide = prs.slides.add_slide(prs.slide_layouts[5])
    
    chart_data = CategoryChartData()
    chart_data.categories = ['East', 'West', 'Midwest']
    chart_data.add_series('Series 1', (19.2, 21.4, 16.7))

    x, y, cx, cy = Inches(2), Inches(2), Inches(6), Inches(4.5)
    slide.shapes.add_chart(
    XL_CHART_TYPE.COLUMN_CLUSTERED, x, y, cx, cy, chart_data
    )

    print(str(idx))

prs.save('test.pptx')

I wonder if anyone has come across this situation before? It seems that pptx-python has to lookup inside the Presentation thus making it slower per iteration. Or is it the way we are using python to loop and load the variables into memory?

Diogo
  • 23
  • 5

2 Answers2

1

This appears to be an O(N^2) behavior in the chart and slide partname assignment. More details in the GitHub issue thread here: https://github.com/scanny/python-pptx/issues/644#issuecomment-685056215

scanny
  • 26,423
  • 5
  • 54
  • 80
1

So what I did is that on each loop iteration I create new key in the self.partnames corresponding to received tmpl that goes after /ppt/<something> and increment by 1, this doesn't require each time to loop over all partnames and identify what the next partname is available.

    def next_partname(self, tmpl):
        """
        Return a |PackURI| instance representing the next partname
        matching *tmpl*, which is a printf (%)-style template string
        containing a single replacement item, a '%d' to be used to insert the
        integer portion of the partname. Example: '/ppt/slides/slide%d.xml'
        """
        name = tmpl.split(os.sep)[2]
        self.partnames[name] += 1
        candidate_partname = tmpl % self.partnames[name]
        return PackURI(candidate_partname)

I know that it could be improved further more, just need some tips on what I have missed maybe.

simkusr
  • 770
  • 2
  • 11
  • 20