Bold, underlining, and Iterations with python-docx

Question

I am writing a program to take data from an ASCII file and place the data in the appropriate place in the Word document, and making only particular words bold and underlined. I am new to Python, but I have extensive experience in Matlab programming. My code is:

#IMPORT ASCII DATA AND MAKE IT USEABLE
#Alternatively Pandas - gives better table display results
import pandas as pd
data = pd.read_csv('203792_M-51_Niles_control_SD_ACSF.txt', sep=",", 
header=None)
#print data
#data[1][3]  gives value at particular data points within matrix
i=len(data[1])
print 'Number of Points imported =', i
#IMPORT WORD DOCUMENT
import docx  #Opens Python Word document tool
from docx import Document  #Invokes Document command from docx
document = Document('test_iteration.docx')  #Imports Word Document to Modify
t = len(document.paragraphs)  #gives the number of lines in document
print 'Total Number of lines =', t
#for paragraph in document.paragraphs:
   # print(para.text)  #Prints the text in the entire document
font = document.styles['Normal'].font
font.name = 'Arial'
from docx.shared import Pt
font.size = Pt(8)
#font.bold = True
#font.underline = True
for paragraph in document.paragraphs:
    if 'NORTHING:' in paragraph.text:
        #print paragraph.text
        paragraph.text = 'NORTHING: \t',  str(data[1][0])
        print paragraph.text   
    elif 'EASTING:' in paragraph.text:
        #print paragraph.text
        paragraph.text = 'EASTING: \t', str(data[2][0])
        print paragraph.text
    elif 'ELEV:' in paragraph.text:
        #print paragraph.text
        paragraph.text = 'ELEV: \t', str(data[3][0])
        print paragraph.text
    elif 'CSF:' in paragraph.text:
        #print paragraph.text
        paragraph.text = 'CSF: \t', str(data[8][0])
        print paragraph.text
    elif 'STD. DEV.:' in paragraph.text:
        #print paragraph.text
        paragraph.text = 'STD. DEV.: ', 'N: ', str(data[5][0]), '\t E: ', 
str(data[6][0]), '\t EL: ', str(data[7][0])
    print paragraph.text
#for paragraph in document.paragraphs:
   #print(paragraph.text)  #Prints the text in the entire document
#document.save('test1_save.docx') #Saves as Word Document after Modification

My question is how to make only the "NORTHING:" bold and underlined in:

    paragraph.text = 'NORTHING: \t',  str(data[1][0])
    print paragraph.text

So I wrote a pseudo "find and replace" command that works great if all the values being replaced are the exactly same. However, I need to replace the values in the second paragraph with the values from the second array of the ASCII file, and the third paragraph with the values from the third array..etc. (I have to use find and replace because the formatting of the document is to advanced for me to replicate in a program, unless there is a program that can read the Word file and write the programming back as Python script...reverse engineer it.)

I am still just learning, so the code may seem crude to you. I am just trying to automate this boring process of copy and pasting.

David Zemens · Accepted Answer · 2018-12-05T19:57:36.513

Untested, but assuming python-docx is similar to python-pptx (it should be, it's maintained by the same developer, and a cursory review of the documentation suggests that the way it interfaces withthe PPT/DOC files is the same, uses the same methods, etc.)

In order to manipulate substrings of paragraphs or words, you need to use the run object:

https://python-docx.readthedocs.io/en/latest/api/text.html#run-objects

In practice, this looks something like:

for paragraph in document.paragraphs:
    if 'NORTHING:' in paragraph.text:
        paragraph.clear()
        run = paragraph.add_run()
        run.text = 'NORTHING: \t'
        run.font.bold = True
        run.font.underline = True
        run = paragraph.add_run()
        run.text = str(data[1][0])

Conceptually, you create a run instance for each part of the paragraph/text that you need to manipulate. So, first we create a run with the bolded font, then we add another run (which I think will not be bold/underline, but if it is just set those to False).

Note: it's preferable to put all of your import statements at the top of a module.

This can be optimized a bit by using a mapping object like a dictionary, which you can use to associate the matching values ("NORTHING") as keys and the remainder of the paragraph text as values. ALSO UNTESTED

import pandas as pd
from docx import Document  
from docx.shared import Pt

data = pd.read_csv('203792_M-51_Niles_control_SD_ACSF.txt', sep=",", 
header=None)
i=len(data[1])
print 'Number of Points imported =', i
document = Document('test_iteration.docx')  #Imports Word Document to Modify
t = len(document.paragraphs)  #gives the number of lines in document
print 'Total Number of lines =', t
font = document.styles['Normal'].font
font.name = 'Arial'
font.size = Pt(8)

# This maps the matching strings to the data array values
data_dict = {
    'NORTHING:': data[1][0],
    'EASTING:': data[2][0],
    'ELEV:': data[3][0],
    'CSF:': data[8][0],
    'STD. DEV.:': 'N: {0}\t E: {1}\t EL: {2}'.format(data[5][0], data[6][0], data[7][0])
    }

for paragraph in document.paragraphs:
    for k,v in data_dict.items():
        if k in paragraph.text:
            paragraph.clear()
            run = paragraph.add_run()
            run.text = k + '\t'
            run.font.bold = True
            run.font.underline = True
            run = paragraph.add_run()
            run.text = '{0}'.format(v)

Thanks for the quick response! I receive this error with the when trying to use the dictionary: "for k,v in data_dict.items: TypeError: 'builtin_function_or_method' object is not iterable" Maybe it does not like k,v in the same for loop? — Cory Smith, Dec 05 '18 at 19:54
@CorySmith sorry, add a parentheses. `for k,v in data_dict.items():` — David Zemens, Dec 05 '18 at 19:57
That worked! Now I am trying to iterate through the rows of the "data" matrix. I tried to replace to zeros in the data_dict = { 'NORTHING:' : data[1][I], ... run.text = '{0}'.format(v) ... I=I+1, however that did not work. Can you point me in the correct direction. In Matlab I would have done a for loop within a while loop, but the dictionary is throwing me off... — Cory Smith, Dec 05 '18 at 20:24
well you can't put an undfined variable in a dictionary constructor, so that won't work. But you could just use `data[1]` and `data[2]` in the dictionary, and then in your loop would do `'{0}'.format(v[i])` (assuming `i` is your iterator/counter. But really it's not clear what you're doing at this point, and you should refrain from using the comments to extend with multiple follow-up queries. Time to ask a new/separate question, and you'll be more likely to get an answer (because more people will see it). — David Zemens, Dec 05 '18 at 20:42

Bold, underlining, and Iterations with python-docx

1 Answers1

Linked