0

I'm trying to create a script to find and replace multiple words in a Word Document, win32com works really well for this, but it's pretty slow and with more than 10 terms it can several minutes. This script takes ~5 minutes to finish and there's 1 occurrence of each term in the word document.

import win32com.client as win32

term_list = ['Item0', 'Item1', 'Item2', 'Item3', 'Item4', 'Item4', 'Item5', 'Item6', 'Item7', 'Item8', 'Item9']

path_docx = r'C:\Temp\Document.docx'

word = win32.gencache.EnsureDispatch('Word.Application')

const = win32.constants
word.Visible = True
doc = word.Documents.Open(path_docx)

for paragraph in doc.Paragraphs:
    #print(paragraph)
    for item in term_list:
        paragraph.Range.Find.Execute(FindText=item, ReplaceWith="Replaced", Replace=const.wdReplaceAll)

doc.SaveAs(r'C:\Temp\Document_replaced.docx')

Is there anything I can do to improve my code?

I also know the python-docx exists, but I prefer using Word itself to find and replace if possible.

Edit:

As @tst suggested setting word.Visible = False helped a bit, but my code was also making many calls to COM and this decreased performance. This new code is really fast. It scales up nicely too, I can loop through a list of about 500 terms in 5 seconds.

import win32com.client as win32

term_list = ['Item0', 'Item1', 'Item2', 'Item3', 'Item4', 'Item4', 'Item5', 'Item6', 'Item7', 'Item8', 'Item9']

path_docx = r'C:\Temp\Document.docx'
word = win32.gencache.EnsureDispatch('Word.Application')

const = win32.constants
word.Visible = False
doc = word.Documents.Open(path_docx)

for items in term_list:
    findObject = word.Selection.Find
    findObject.ClearFormatting()
    findObject.Text = items
    findObject.Replacement.ClearFormatting()
    findObject.Replacement.Text = "FOUND"
    findObject.Execute(Replace=win32.constants.wdReplaceAll)

doc.Save()
doc.Close()
Spooknik
  • 13
  • 1
  • 5
  • I have played around with Word and Excel using win32com a lot, and as far as I can tell, making the app run in the background makes it a lot faster. Maybe time the execution time (like on your phone) and then change `word.Visible` to `False` and try again. For my scripts, they tend to be a couple of seconds faster that way at least. – tst Jan 16 '20 at 11:28
  • @tst Thanks! I'll give that a shot. – Spooknik Jan 16 '20 at 11:30
  • 1
    Just to follow up, performance seems to be a bit better with `word.Visible` set to `False`. But the real performance killer was I believe all the calls to COM, I found a much more efficient way to code it (see edit in OP). – Spooknik Jan 16 '20 at 12:37

0 Answers0