4

I am using the python pptx module to automatically update values in a powerpoint file. I am able to extract all the text in the file using the code below:

from pptx import Presentation
prs = Presentation(path_to_presentation)
# text_runs will be populated with a list of strings,
# one for each text run in presentation
text_runs = []
for slide in prs.slides:
  for shape in slide.shapes:
    if not shape.has_text_frame:
      continue
  for paragraph in shape.text_frame.paragraphs:
    for run in paragraph.runs:
      text_runs.append(run.text)

This code will extract all the text in a file but fails to extract text that is in a ppt table and I would like to update some of these values. I have tried to implement some code from this question: Reading text values in a PowerPoint table using pptx? but could not. Any Ideas? Thanks.

Community
  • 1
  • 1
tjpereira17
  • 53
  • 1
  • 1
  • 4

3 Answers3

5

This works for me:

    def access_table(): 
            slide = prs.slides[0] #first slide
            table = slide.shapes[2].table # maybe 0..n
            for r in table.rows:
                    s = ""
                    for c in r.cells:
                            s += c.text_frame.text + " | "
                            #to write
                            #c.text_frame.text = "example"
                    print s
Watash1
  • 320
  • 4
  • 7
4

How to Extract All of the Text out of Tables Inside of a Slide-show Presentation

The following code extracts text from tables in a slide-show presentation. Text in the presentation outside of tables is omitted, but you can modify my code to capture text from non-table objects as well.

import pptx as pptx
from pptx import *

def get_tables_from_presentation(pres):
   """
   The input parameter `pres` should receive
   an object returned by `pptx.Presentation()`

   EXAMPLE:
       ```
       import pptx
       p = "C:\\Users\\user\\Desktop\\power_point_pres.pptx"
       pres = pptx.Presentation(p)

       tables = get_tables_from_presentation(pres)
       ```
   """
   tables = list()
   for slide in pres.slides:
      for shp in iter(slide.shapes):
         if shp.has_table:
            table = shp.table
            tables.append(table)
   return tables


def iter_to_nonempty_table_cells(tbl):
   """
   :param tbl: 'pptx.table.Table'
          input table is NOT modified

   :return: return iterator to non-empty rows
   """
   for ridx in range(sum(1 for _ in iter(tbl.rows))):
      for cidx in range(sum(1 for _ in iter(tbl.columns))):
         cell = tbl.cell(ridx, cidx)
         txt = type("")(cell.text)
         txt = txt.strip()
         if len(txt) > 1:
            yield txt


# establish read path
in_file_path = "C:\\Users\\user\\Desktop\\power_point_pres.pptx"

# Open slide-show presentation
pres = Presentation(in_file_path)

# extract tables from slide-show presentation
tables = get_tables_from_presentation(pres)

for tbl in tables:
   it = iter_to_nonempty_table_cells(tbl)
   print("".join(it))

A Note About One of the Other Answers to This Question

Someone else posted a semi-useful answer to this question written in pseudo-code. They wrote the following:

For r = 1 to tbl.rows.count
  For c = 1 to tbl.columns.count
     tbl.cell(r,c).Shape.Textframe.Text

The problem is, that is not python.

In python, it is illegal syntax to write For r = 1 to 10 Instead, we would write something like the following:

for r in range(1, 11):
   print(r)  

from itertools import *
for r in takewhile(lambda k: k <= 10, count(1)):
   print(r)

Additionally, the row indicies start at r = 0 not r = 1

The upper-left corner of the table is tbl.cell(0,0) not tbl.cell(1,1)

There is no such thing as .count for the rows attribute or the columns attribute. (For r = 1 to tbl.rows.count) makes no sense because there is no such thing as tbl.rows.count

tbl.cell(r,c).Shape won't work, because objects instantiated from the class pptx.table._Cell have no attribute named Shape

cell objects have the following attributes:

  • fill
  • is_merge_origin
  • is_spanned
  • margin_bottom
  • margin_left
  • margin_right
  • margin_top
  • merge
  • part
  • span_height
  • span_width
  • split
  • text
  • text_frame
  • vertical_anchor

A fix is shown below:

# ----------------------------------------
# BEGIN SYNTACTICALLY INCORRECT CODE
# ----------------------------------------
# For r = 1 to tbl.rows.count
#   For c = 1 to tbl.columns.count
#      tbl.cell(r,c).Shape.Textframe.Text
# ----------------------------------------
# END SYNTACTICALLY INCORRECT CODE
# BEGIN SYNTACTICALLY CORRECT CODE
# ----------------------------------------
for r in range(sum(1 for row in iter(tbl.rows))):
    for c in range(sum(1 for _ in iter(tbl.columns))):
        print(tbl.cell(r,c).text)
# ----------------------------------------
# END SYNTACTICALLY CORRECT CODE
# ----------------------------------------

A Note About your Original Code

The continue keyword

In your original source code, you have the following for-loop:

for shape in slide.shapes:
    if not shape.has_text_frame:
      continue

That for-loop does not do anything.

The continue keyword simply means "increment the loop-counter and jump to the beginning of the loop" However, there is no code after your continue and before the end of the loop. That is, the loop would have continued anyway without you having to write continue because it is already at the end of the loop-body.

To understand more about continue consider the following example:

for k in [1, 2, 3, 4, 5]:
    print("For k ==", k, "we have k % 2 == ", k % 2)
    if not k % 2 == 0:
        continue
    print("For k ==", k, "we got past the `continue`")

The output is:

For k == 1 we have k % 2 ==  1
For k == 2 we have k % 2 ==  0
For k == 2 we got past the `continue`
For k == 3 we have k % 2 ==  1
For k == 4 we have k % 2 ==  0
For k == 4 we got past the `continue`
For k == 5 we have k % 2 ==  1

The following three pieces of code all print the exact same messages, regardless of the use of the continue keyword:

for k in [1, 2, 3, 4, 5]:
    print(k)

for k in [1, 2, 3, 4, 5]:
    print(k)
    continue

for k in [1, 2, 3, 4, 5]:
    print(k)
    if float(k)//1 % 2 == 0:
        continue
Toothpick Anemone
  • 4,290
  • 2
  • 20
  • 42
3

Your code will miss more text than just tables; it won't see text in shapes that are part of groups, for example.

For tables, you'll need to do a couple things:

Test the shape to see if the shape's .HasTable property is true. If so, you can work with the shape's .Table object to extract the text. Conceptually, and very aircode:

For r = 1 to tbl.rows.count
   For c = 1 to tbl.columns.count
      tbl.cell(r,c).Shape.Textframe.Text ' is what you're after
Steve Rindsberg
  • 14,442
  • 1
  • 29
  • 34
  • Forgot to mention: it may get more complex if you have tables within placeholders. Shout if you need help with that one. – Steve Rindsberg Jan 09 '15 at 03:36
  • I am trying to use this solution but it does not work for me. It says `count` attribute is not available. – chintan s Jan 29 '19 at 10:41
  • @chintans Please post the code that you're using, indicate the line where you get this error and mention what sort of object the code is working on at that point. – Steve Rindsberg Jan 29 '19 at 15:15
  • Thank you, here is the link to my question with code https://stackoverflow.com/questions/54419118/extract-table-from-powerpoint – chintan s Jan 29 '19 at 16:26
  • The line of code `tbl.rows.count` is not valid code for python's `pptx` library. Objects instantiated from `pptx.table.Table` do have an attribute labeled `rows`, but `Table.rows` has no attribute labeled, `count`. Instead, we have to write something like, `for r in range(sum(1 for _ in iter(tbl.rows)))` – Toothpick Anemone Aug 04 '22 at 22:13
  • `For r = 1 to 10` is not how you write a for-loop in python. Given that the question is about python, I don't know why you wrote your code half in python and half in pseudo-code. In python there is no `to` keyword. Also, should replace the assignment statement `r = 1` with `r in`. A for-loop in python looks like `for r in range(1, 1 + tbl.rows.count)` – Toothpick Anemone Aug 04 '22 at 22:15
  • @SamuelMuldoon Because I don't use python but I know a good bit about PowerPoint. And apparently several people have already been able to look past that to see the big picture. If you have a *working* solution in python, please feel free to edit my answer. – Steve Rindsberg Aug 05 '22 at 02:31