11

As an example, I have a generic script that outputs the default table styles using python-docx (this code runs fine):

import docx
d=docx.Document()
type_of_table=docx.enum.style.WD_STYLE_TYPE.TABLE

list_table=[['header1','header2'],['cell1','cell2'],['cell3','cell4']]
numcols=max(map(len,list_table))
numrows=len(list_table)

styles=(s for s in d.styles if s.type==type_of_table)


for stylenum,style in enumerate(styles,start=1):
    label=d.add_paragraph('{}) {}'.format(stylenum,style.name))
    label.paragraph_format.keep_with_next=True
    label.paragraph_format.space_before=docx.shared.Pt(18)
    label.paragraph_format.space_after=docx.shared.Pt(0)
    table=d.add_table(numrows,numcols)
    table.style=style
    for r,row in enumerate(list_table):
        for c,cell in enumerate(row):
            table.row_cells(r)[c].text=cell


d.save('tablestyles.docx')   

Next, I opened the document, highlighted a split table and under paragraph format, selected "Keep with next," which successfully prevented the table from being split across a page:

enter image description here

Here is the XML code of the non-broken table:

enter image description here

You can see the highlighted line shows the paragraph property that should be keeping the table together. So I wrote this function and stuck it in the code above the d.save('tablestyles.docx') line:

def no_table_break(document):
    tags=document.element.xpath('//w:p')
    for tag in tags:
        ppr=tag.get_or_add_pPr()
        ppr.keepNext_val=True


no_table_break(d)     

When I inspect the XML code the paragraph property tag is set properly and when I open the Word document, the "Keep with next" box is checked for all tables, yet the table is still split across pages. Am I missing an XML tag or something that's preventing this from working properly?

LMc
  • 12,577
  • 3
  • 31
  • 43
  • I think you're going to need to be more specific about what an "orphaned" row is. The next step will then be to see if you can accomplish the result you're after using the Word application/UI. If you can narrow it down that way you can determine the XML element/attribute that makes the difference. `w:cantSplit` could determine whether a cell is split across pages (with its row of course). – scanny Jan 10 '17 at 21:34
  • @scanny all I mean by orphaned row is that part of a table is on one page and the other part of the table is on another. – LMc Jan 10 '17 at 21:36
  • The question is whether the break is on an even row boundary or broken within the row, like part of a row on one page and the rest of it at the top of the next page. These are distinct (mis-)behaviors. – scanny Jan 10 '17 at 22:27
  • @scanny in my case the breaks are on even row boundaries, not a wrap around row. – LMc Jan 11 '17 at 00:50
  • Can you accomplish the result you're looking for with the Word application? If so, what did you do that worked? Like what dialog box option or whatever? – scanny Jan 11 '17 at 02:34
  • @scanny when I highlight the table that is broken across two pages and go to the paragraph dialogue box, under the "Line and Page Breaks" tab, selecting the "Keep with next" checkbox does the trick. – LMc Jan 11 '17 at 16:00
  • Ah, ok, interesting. Now I see your situation. I don't know how to fix it off the top of my head. This is the point at which I would inspect the XML from the working example and compare it to the one that doesn't. Once you have the required XML elements/attributes understood you can set about getting them set properly, probably by working with the XML directly using lxml calls, the kind of thing you can find examples of using "python-docx workaround function" on search. – scanny Jan 11 '17 at 20:25
  • @scanny After inspecting the XML elements of a table split across a page and not split across a page, I identified the XML element that should make this work, but it still doesn't – LMc Feb 08 '17 at 21:38
  • I'm also struggling to keep my tables together. I _think_ one confusion here is that the setting you're referring to keeps word for splitting a table-row across the page. That does seem to work, but keeping the table together is what you (and I) really want. It would help if tables were added to paragraphs, then we could keep the paragraph from overlapping the page. However, because they are added to the document, some document level setting is needed. – TSeymour Jul 23 '18 at 21:46

3 Answers3

3

Ok, I also needed this. I think we were all making the incorrect assumption that the setting in Word's table properties (or the equivalent ways to achieve this in python-docx) was about keeping the table from being split across pages. It's not -- instead, it's simply about whether or not a table's rows can be split across pages.

Given that we know how successfully do this in python-docx, we can prevent tables from being split across pages by putting each table within the row of a larger master table. The code below successfully does this. I'm using Python 3.6 and Python-Docx 0.8.6

import docx
from docx.oxml.shared import OxmlElement
import os
import sys


def prevent_document_break(document):
    """https://github.com/python-openxml/python-docx/issues/245#event-621236139
       Globally prevent table cells from splitting across pages.
    """
    tags = document.element.xpath('//w:tr')
    rows = len(tags)
    for row in range(0, rows):
        tag = tags[row]  # Specify which <w:r> tag you want
        child = OxmlElement('w:cantSplit')  # Create arbitrary tag
        tag.append(child)  # Append in the new tag


d = docx.Document()
type_of_table = docx.enum.style.WD_STYLE_TYPE.TABLE

list_table = [['header1', 'header2'], ['cell1', 'cell2'], ['cell3', 'cell4']]
numcols = max(map(len, list_table))
numrows = len(list_table)

styles = (s for s in d.styles if s.type == type_of_table)

big_table = d.add_table(1, 1)
big_table.autofit = True

for stylenum, style in enumerate(styles, start=1):
    cells = big_table.add_row().cells
    label = cells[0].add_paragraph('{}) {}'.format(stylenum, style.name))
    label.paragraph_format.keep_with_next = True
    label.paragraph_format.space_before = docx.shared.Pt(18)
    label.paragraph_format.space_after = docx.shared.Pt(0)

    table = cells[0].add_table(numrows, numcols)
    table.style = style
    for r, row in enumerate(list_table):
        for c, cell in enumerate(row):
            table.row_cells(r)[c].text = cell

prevent_document_break(d)

d.save('tablestyles.docx')

# because I'm lazy...
openers = {'linux': 'libreoffice tablestyles.docx',
           'linux2': 'libreoffice tablestyles.docx',
           'darwin': 'open tablestyles.docx',
           'win32': 'start tablestyles.docx'}
os.system(openers[sys.platform])
TSeymour
  • 729
  • 3
  • 17
2

Have been straggling with the problem for some hours and finally found the solution worked fine for me. I just changed the XPath in the topic starter's code so now it looks like this:

def keep_table_on_one_page(doc):
    tags = self.doc.element.xpath('//w:tr[position() < last()]/w:tc/w:p')
    for tag in tags:
        ppr = tag.get_or_add_pPr()
        ppr.keepNext_val = True

The key moment is this selector

[position() < last()]

We want all but the last row in each table to keep with the next one

ekon
  • 443
  • 3
  • 12
1

Would have left this is a comment under @DeadAd 's answer, but had low rep. In case anyone is looking to stop a specific table from breaking, rather than all tables in a doc, change the xpath to the following:

tags = table._element.xpath('./w:tr[position() < last()]/w:tc/w:p')

where table refers to the instance of <class 'docx.table.Table'> which you want to keep together.

"//" will select all nodes that match the xpath (regardless of relative location), "./" will start selection from current node

Artsiom Vahin
  • 51
  • 1
  • 7