import docx
import collections
listofnames = list()
filename = 'Missing_Assignments.docx'
filehandle = docx.Document(filename)
studentinfo = filehandle.paragraphs
for student in studentinfo:
if len(student.text) > 1 or len(student.text) > 20:
listofnames.append(student.text)
for name in listofnames:
if name.startswith('Assignment'):
listofnames.remove(name)
counts = collections.Counter(listofnames)
counts = dict(counts)
filehandle.add_paragraph('\n')
for name,count in counts.items():
filehandle.add_paragraph(name + ' ' + str(count))
filehandle.save(filename)
print('Complete!')
More of a learning/efficiency question...if this is not generally considered appropriate please let me know what forums may be more suitable.
Question is, why do I have to use docx? I'm used to creating a simple handle like:
filehandle = open(filename)
And being able to iterate through a file this way. I was receiving all kinds of UNICODE errors before using python-docx libraries. Just seems slightly more complicated because I have to use their verbage as opposed to directly iterating through each line of text like I normally would.
- Also, does anyone know of a way break off the counting function shown here? I want to count the amount of times a name appears for various missing assignments but only for that period. Other periods may have students with the same name so this would complicate the counting?