I need to transform 140 .docx files into txt. I am using the following Python code that I found here in StackOverflow.
I tried this:
import os
from docx import Document
# Path to the folder containing .docx files
input_folder = "C:/Users/XXXXX/Desktop/word"
# Path to the folder where .txt files will be saved
output_folder = "C:/Users/XXXXX/Desktop/Txt"
# Get a list of all .docx files in the input folder
files = [f for f in os.listdir(input_folder) if f.endswith(".docx")]
# Loop through each .docx file and convert it to .txt
for file in files:
docx_path = os.path.join(input_folder, file)
txt_path = os.path.join(output_folder, os.path.splitext(file)[0] + ".txt")
doc = Document(docx_path)
content = [p.text for p in doc.paragraphs]
with open(txt_path, "w", encoding="utf-8") as txt_file:
txt_file.write("\n".join(content))
print("Conversion complete!")
But whenever I run the code, I get this error:
---------------------------------------------------------------------------
ModuleNotFoundError Traceback (most recent call last)
Cell In[1], line 2
1 import os
----> 2 from docx import Document
4 # Path to the folder containing .docx files
5 input_folder = "C:/Users/XXXXX/Desktop/word"
File ~\anaconda3\Lib\site-packages\docx.py:30
27 except ImportError:
28 TAGS = {}
---> 30 from exceptions import PendingDeprecationWarning
31 from warnings import warn
33 import logging
ModuleNotFoundError: No module named 'exceptions'
Do you know why am I getting this error and how could I solve this? Thank you!