How can i extract text from '.odt' and '.doc' format file from url using python ? I tried searching for it but couldn't find anything.
Any lead will be helpful.
from odf import text, teletype
from odf.opendocument import load
textdoc = load(r"C:\Users\OMS\Downloads\sample1.odt")
allparas = textdoc.getElementsByType(text.P)
for i in range(len((allparas))):
a=teletype.extractText(allparas[i])
print(a)
this works for local .odt file but now i need to extract from an
"https://abc.s3.ap-south-1.amazonaws.com/sample1.odt"
Assume connection to aws s3 has been done using boto3 .