1

I am printing from a .pptx but the single sentences are split into new lines in between from somewhere ..Here is the screenshot from a slide.. enter image description here

When reading through below code.. from pptx import Presentation

prs = Presentation(path_to_presentation)
for slide in prs.slides:
      for shape in slide.shapes:
            if not shape.has_text_frame:
                continue
            for paragraph in shape.text_frame.paragraphs:
                for run in paragraph.runs:
                    print(run.text)

Getting output like below...

Books include:
Learning Python 
by Mark Lutz
Python Essential Reference 
by David Beazley
Python Cookbook
, ed. by Martelli, Ravenscroft and Ascher
(online at http://code.activestate.com/recipes/langs/python/)
http://wiki.python.org/moin/PythonBooks

You can compare the screenshot fro pptx and the printed text from pptx , bullet points are getting split into two or more sentences ..Like "Learning Python by Mark Lutz" printing in 2 points "Learning Python" and "by Mark Lutz" and even bullets are getting missed.

How to fix this issue?

RonyA
  • 585
  • 3
  • 11
  • 26

1 Answers1

4

Short answer is use paragraph.text not run.text:

for paragraph in shape.text_frame.paragraphs:
    print(paragraph.text)

A paragraph is a coherent block of text that flows between margins without a vertical break. This is a user distinction because it affects how we read the content. A run is a sequence of characters that shares the same character formatting (i.e. font, but including bold, italic, etc.). A run is a technical distinction because their boundaries should not be apparent to a reader; they are just used to tell PowerPoint "apply this character formatting to all these characters".

If you print every run separately, they'll break at seemingly random places in the paragraph, depending at least on where italics turn on and off, but also frequently at other places, like where someone edited to add a few characters. PowerPoint does not necessarily minimize the number of runs, even when two consecutive runs have the same formatting. Consequently, they tend to proliferate.

scanny
  • 26,423
  • 5
  • 54
  • 80
  • Thanks for the wonderful explanation .I was going through the python-pptx doc and found that we can create pptx using python and to add new slide .add_slide. However when I was trying to make a new pptx of 4 slides with the old pptx (4 slides) unable to do so. Might be my approach is wrong `from pptx import Presentation.Below is the code.. – RonyA Sep 21 '18 at 05:04
  • I added 2-3 more lines in the existing code.. Its not working.. `from pptx import Presentation prs1 = Presentation() prs = Presentation('Python.pptx') title_slide_layout = prs1.slide_layouts[0] for slide in prs.slides: for shape in slide.shapes: if not shape.has_text_frame: continue for paragraph in shape.text_frame.paragraphs: # print(paragraph.text) s=prs1.slides.add_slide(paragraph.text) prs1.save('new.pptx')` – RonyA Sep 21 '18 at 05:06
  • That sounds like a new question @RonyA, if you post it separately I'll have a look. Be sure to use the `python-pptx` tag on the new question. – scanny Sep 21 '18 at 17:05
  • Here is the link - https://stackoverflow.com/questions/52448755/create-new-pptx-using-existing-pptx-python-pptx – RonyA Sep 21 '18 at 17:46