-1

I have a developed a piece of code that I would like to run on a folder of docx files. I have successfully figured out how I can run this on one file (see code below) but am too green to figure out how to batch this.

I would like to write all docx files into the same output.txt file as opposed to writing to a separate txt file for each docx file.

Thank you for your help!

import docx

from docx import Document

def readtxtandtab(filename):
    doc = docx.Document(filename)
    fullText = []
    for table in doc.tables:
        for row in table.rows:
            for cell in row.cells:
                fullText.append(cell.text)

    for para in doc.paragraphs:
        fullText.append(para.text)

    return '\n'.join(fullText)

with open("Output.txt", "a") as text_file:
    text_file.write(readtxtandtab('filename.docx'))
David Specht
  • 7,784
  • 1
  • 22
  • 30

1 Answers1

0

You can use glob to get all the word files in a directory. You can then use a for loop to execute your code on every file.

import glob
.
.
.
with open("output.txt", "a") as text_file:
    for wordfile in glob.glob('*.docx'):
        text_file.write(readtxtandtab(wordfile))

For glob the string *.docx means select anything that ends with .docx.

Alex
  • 963
  • 9
  • 11
  • Thank you very much. Glob worked very nicely but I had to make a change two changes in the code you provided. First, I had to make it [:-4], which is probably what you meant to do anyway because it was otherwise capturing the first 4 characters of the filename and not the last 4. Second, I had to add an encoding parameter within the open function because I was getting an encoding error. I do still have multiple txt files instead of the one combined one that I was hoping to get to. – Vamsi Tetali Jul 10 '19 at 14:40
  • @VamsiTetali Sorry I missed the second part of your question. The code should now write everything to the same file. – Alex Jul 10 '19 at 23:26