I am attempting to read a .docx file into Python. The file is organized into two tables (it's messy), one with Chinese characters and the other with English. However, it seems that when I am reading the text from these tables, the parentheses do not show up.
I read the text from the .docx file as follows
import numpy as np
from docx import Document
doc = Document('2003 PPC for corpus.docx')
chinese_text = doc.tables[0].rows[0].cells[0].text
print(chinese_text)
english_text = doc.tables[0].rows[0].cells[1].text.encode('utf-8')
print(english_text)
These print statements then show
[]女士们,先生们,
and
b"Good morning ladies and gentlemen, we are very honor
My question is why am I not reading the characters inside the square brackets in the Chinese text. And why am I not reading the "(3)" at the start of the English text?