1

I'm trying to append a column to a table in PowerPoint using python-pptx. A number of threads mention the solution:

def append_col(prs_obj, sl_i, sh_i):
    # prs_obj is a pptx.Presentation('path') object. 
    # sli_i and sh_i are int indexs to locate a particular table object.

    tab = prs_obj.slides[sl_i].shapes[sh_i].table
    new_col = copy.deepcopy(tab._tbl.tblGrid.gridCol_lst[-1])
    tab._tbl.tblGrid.append(new_col)  # copies last grid element

    for tr in tab._tbl.tr_lst:
        # duplicate last cell of each row
        new_tc = copy.deepcopy(tr.tc_lst[-1])
        tr.append(new_tc)
        cell = _Cell(new_tc, tr.tc_lst)
        cell.text = '--'
    return tab

After running this, when you open PowerPoint the new column will be there, but it won't contain the cell.text. If you click in the cell and type, the letters will appear in the cell of the previous column. Saving powerpoint enables you to edit the column as normal, but obviously you've lost the cell.text (and formatting).

QUESTION UPDATE 1- FOLLOWING COMMENT FROM @scanny

For the simplest possible case, a (1x3) table, like so: |xx|--|xx| the tab._tbl.xml prints before and after appending the column are:

xml diff 1

xml diff 2

xml diff 3

xml diff 4

QUESTION UPDATE 2- FOLLOWING COMMENT FROM @scanny I modified the above append_col function to forcibly remove the extLst element from the copied gridCol. This stopped the problem of typing in one cell and text appearing in another cell.

def append_col(prs_obj, sl_i, sh_i):
    # existing lines removed for brevity

    # New Code

    tblchildren = tab._tbl.getchildren()
        for child in tblchildren:
            if isinstance(child, oxml.table.CT_TableGrid):
                ws = set()
                for j in child:
                    if j.w not in ws:
                        ws.add(j.w)
                    else:
                        for elem in j:
                            j.remove(elem)
    return tab

However cell.text(and formatting)are still missing. Moreover, manually saving the presentation changes the tab.xml object back. The screenshots before and after manually opening the PowerPoint presentation are:

AFTER removing extLst, before manual save - xml diff 1

AFTER removing extLst, AFTER manual save - xml diff 2

thunt
  • 89
  • 1
  • 11
  • Please note, I posted a similar question at https://stackoverflow.com/questions/64249427/using-python-pptx-package-to-append-a-table-row-will-add-a-row-that-when-edited - the focus of this question was rows, not columns. Also the focus of that question led answers to believe it was related to the content of the slide. That is not the case. It's related to the deepcopy method. – thunt Oct 29 '20 at 12:52

2 Answers2

1

If you're serious about solving this sort of problem, you'll need to reverse-engineer the Word XML for this aspect of tables.

The place to start is with before and after (adding a column) XML dumps of the table, identifying the changes made by Word, then duplicating those that matter (things like revision-numbers probably don't matter).

This process is simplified by having a small example, say a 2 x 2 table to a 2 x 3 table.

You can get the XML for a python-docx XML element using its .xml attribute, like:

print(tab._tbl.xml)

You could compare the deepcopy results and then have concrete differences to start to explain the results not working. I expect you'll find that table items have unique ids and when you duplicate those, funky things happen.

scanny
  • 26,423
  • 5
  • 54
  • 80
  • Hi scanny, thanks for the comment. I am serious about solving the problem, however I am not familiar with xml, or how it relates to the complex set of objects from python-pptx. I've updated the question with a set of images of the .xml text before and after appending cols. – thunt Oct 30 '20 at 12:03
  • I would start with a focus on the duplicated gridCol ID. All the others are unique. You might try deleting the whole extLst element in that gridCol element and see if that makes a difference. – scanny Nov 01 '20 at 19:27
  • I deleted the extLst element as you suggested. It solved half the problem, but still not showing the text unfortunately. I've updated the question if you have any more ideas. Thanks for your time. – thunt Nov 02 '20 at 18:53
  • You're appending the new at the end and an extLst element is left in-between the new cell and the previous cell. Use addafter() or the right lxml method to keep the elements contiguous. Element sequence matters in the docx XML vocabulary. – scanny Nov 02 '20 at 19:48
  • Thanks, I understand the problem now. The tr.append(new_tc) is the offending line which incorrectly places the new_tc object after the extList element. However, I'm really struggling to modify tr.xml. All tutorials I read on xml start with tree = ET.parse(file), but tr.xml isn't a file, it's a 'pptx.oxml.xmlchemy.XmlString' object. I tried trchildren = tr.getchildren() then trchildren.insert(5, new_tc). This makes trchildren look like what I want, but it doesn't affect tr and tab._tbl unfortunately. Any help please? – thunt Nov 03 '20 at 19:27
  • Why don't you accept this answer and ask a new question (or two) focused on your particular next-step challenge. I think we've strayed beyond the original question now. – scanny Nov 03 '20 at 21:26
  • I've added a new question: https://stackoverflow.com/questions/64678503/how-do-you-modify-the-xml-of-a-pptx-oxml-xmlchemy-xmlstring-object-at-a-particul . I will accept the answer when I know it to be correct. At the moment, my original problem of adding a column without the contents corrupting is still unanswered. – thunt Nov 04 '20 at 10:44
1

With help from Scanny, I've come up with the following workaround which works:

def append_col(prs_obj, sl_i, sh_i):
    tab = prs_obj.slides[sl_i].shapes[sh_i].table
    new_col = copy.deepcopy(tab._tbl.tblGrid.gridCol_lst[-1])
    tab._tbl.tblGrid.append(new_col)  # copies last grid element
    for tr in tab._tbl.tr_lst:
        new_tc = copy.deepcopy(tr.tc_lst[-1])
        tr.tc_lst[-1].addnext(new_tc)
        cell = _Cell(new_tc, tr.tc_lst)
        for paragraph in cell.text_frame.paragraphs:
            for run in paragraph.runs:
                run.text = '--'
    tblchildren = tab._tbl.getchildren()
    for child in tblchildren:
        if isinstance(child, oxml.table.CT_TableGrid):
            ws = set()
            for j in child:
                if j.w not in ws:
                    ws.add(j.w)
                else:
                    # print('j:\n', j.xml)
                    for elem in j:
                            j.remove(elem)
    return tab
thunt
  • 89
  • 1
  • 11