10

Python Docx is a pretty good library for generating Microsoft Word documents for something that doesn't directly deal with all of the COM stuff. Nonetheless, I'm running into some limitations.

  • Does anyone have any idea how one would put a carriage return in a string of text?

I want a paragraph to have multiple lines without there being extra space between them. However, writing out a string that separates the lines with the usual \n is not working. Nor is using &#10 or &#13. Any other thoughts, or is this framework too limited for something like that?

user1427661
  • 11,158
  • 28
  • 90
  • 132

3 Answers3

8

I'm not sure if this is possible. It looks as though Word is in fact treating presses of the enter key (I am treating this action as a sort of programmatic equivalent of "\r\n" and "\n") as the creation of a new paragraph.


If I record a macro in Word that consists of:

  1. Typing the text "One"
  2. Pressing the enter key

I get VBA of:

Selection.TypeText Text:="One"
Selection.TypeParagraph

If I create a Word document that looks like this (pressing enter after each word):

One

Two

Three

The body of that document looks like this in the documents.xml file:

<w:body>
    <w:p w:rsidR="00BE37B0" w:rsidRDefault="00CF2350">
        <w:r>
            <w:t>One</w:t>
        </w:r>
    </w:p>
    <w:p w:rsidR="00CF2350" w:rsidRDefault="00CF2350">
        <w:r>
            <w:t>Two</w:t>
        </w:r>
    </w:p>
    <w:p w:rsidR="00CF2350" w:rsidRDefault="00CF2350">
        <w:r>
            <w:t>Three</w:t>
        </w:r>
    </w:p>
    <w:sectPr w:rsidR="00CF2350" w:rsidSect="001077CC">
        <w:pgSz w:w="11906" w:h="16838"/>
        <w:pgMar w:top="1440" w:right="1440" w:bottom="1440" w:left="1440" w:header="708" w:footer="708" w:gutter="0"/>
        <w:cols w:space="708"/>
        <w:docGrid w:linePitch="360"/>
    </w:sectPr>
</w:body>

From MSDN we can see that the <w:p> element represents a paragraph.


I think the solution to this would be to follow the example in Python Docx:

body.append(paragraph("Hi."))
body.append(paragraph("My name is Alice."))
body.append(paragraph("Let's code"))

Or:

for paragraph_text in "Hi. \nMy name is Alice.\n Let's code".split("\n"):
    body.append(paragraph(paragraph_text.strip()))

Edit:

Looking into this some more, if you press Shift + Enter in Word it adds a manual line break (not a paragraph) via appending Chr(11). In Open XML, this translates to a Break.

Looking at the docx.py file of Python Docx, something like this might be the way to go (disclaimer: not tested):

for text in "Hi. \nMy name is Alice.\n Let's code".split("\n"):
    run = makeelement('r')
    run.append(makeelement('t', tagtext=text))
    run.append(makeelement('br'))
    body.append(run)
nick_w
  • 14,758
  • 3
  • 51
  • 71
  • Your analysis of how the paragraphs are created looks right, but body.append(paragraph) doesn't appear to be a solution. That just recreates the problem you showed above with the "One", "Two", and "Three" paragraphs. What I want is something that somehow gets rid of that extra space between paragraphs, probably by somehow having a paragraph element recognize a single line break character. – user1427661 Jan 22 '13 at 17:52
8

You can achieve your carriage return using python-docx by calling add_break() on your run. For example:

doc = Document()
p = doc.add_paragraph()
run = p.add_run()
run.add_break()

python-docx reference

chirinosky
  • 4,438
  • 1
  • 28
  • 39
1

As of v0.7.2, python-docx translates '\n' and '\r' characters in a string to <w:br/> elements, which provides the behavior you describe. It also translates '\t' characters into <w:tab/> elements.

This behavior is available for strings provided to:

  • Document.add_paragraph()
  • Paragraph.add_run()

and for strings assigned to:

  • Paragraph.text
  • Run.text
scanny
  • 26,423
  • 5
  • 54
  • 80