0

I want to extract text from a .doc file, but I want to use it in either fastapi or flash function and upload a file.

So I want to read a buffer string of the .doc file and extract text?

If I had a file path then I would just do,

import os
os.system("antiword 'files/file.doc'")

But in my case what can I do to extract text? I don't want to use any libraries.

martineau
  • 119,623
  • 25
  • 170
  • 301
user_12
  • 1,778
  • 7
  • 31
  • 72
  • You will have to parse the plain text output of Antiword. I suggest you use the `subprocess` module instead of `os.system()` because it should make it relatively easy to capture the output from running Antiword. – martineau Nov 10 '20 at 19:35
  • @martineau How can I parse the plain text output? Can you provide some code? I don't have the file path, I only have a binary format of doc file data – user_12 Nov 10 '20 at 22:21
  • No I have no code to provide nor any idea of what the format of the plain text data Antiword produces looks like. I suggest you create some sample Word files and see what it produces looks like — this should give you some idea of what needs to be done. Another possibility would be to read the Antiword source code and figure out how it does things. – martineau Nov 10 '20 at 23:11

0 Answers0