I want to let user submit a MS Word file to my app, process it with python-docx library and return it back. Since a file size might be big, I do not want to save it into the file system after processing but rather return it for download.
Get file from stream - this works
import docx
from docx.document import Document
from StringIO import StringIO
source_stream = StringIO(request.vars['file'].value)
document = docx.Document(source_stream)
source_stream.close()
process_doc(document)
Return it as a stream - this does not work
The app makes indeed user to download file, but *MS Word can't open file, saying "because some part is missing or invalid".
def download(document, filename):
import contenttype as c
import cStringIO
out_stream = cStringIO.StringIO()
document.save(out_stream)
response.headers['Content-Type'] = c.contenttype(filename)
response.headers['Content-Disposition'] = \
"attachment; filename=%s" % filename
return out_stream.getvalue()
I've found Upload a StringIO object with send_file() but this persist to the flask framework. I rather use web2py framework.
Update 1
Some said about moving file pointer to the start of document data before sending it in output stream. But how to do it?
Update 2
As @scanny has suggested, I've created an empty file,
document = docx.Document()
and made it to download from file object using BytesIO
module:
document = docx.Document()
from io import BytesIO
out_stream = BytesIO()
document.save(out_stream)
filename = 'temporal_file.docx'
filepath = os.path.join(request.folder, 'uploads',filename )
try:
with open(filepath, 'wb') as f:
f.write(out_stream.getvalue())
response.flash ='Success to open file for writing'
response.headers['Content-Disposition'] = "attachment; filename=%s" % filename
response.headers['Content-Type'] = 'application/vnd.openxmlformats-officedocument.wordprocessingml.document'
#response['X-Sendfile'] = filepath
#response['Content-Length'] = os.stat(filepath).st_size
return out_stream.getvalue()
As seen in the code, I also write that empty file into the file-system. And I could easily manually download it and open it in MS word:
So, still the question is open why the downloaded MS Word file (thru the output stream) is damaged and cannot be opened by MS Word?
Update 3
I've eliminated python-docx
from the process of file output into an out stream. And the result was the same: after the file download process one can't open it in MS Word. Code:
# we load without python-docx library
from io import BytesIO
try:
filename = 'empty_file.docx'
filepath = os.path.join(request.folder, 'uploads',filename )
# read a file from file system (disk)
with open(filepath, 'rb') as f:
out_stream = BytesIO(f.read())
response.flash ='Success to open file for reading'
response.headers['Content-Disposition'] = "attachment; filename=%s" % filename
response.headers['Content-Type'] = 'application/vnd.openxmlformats-officedocument.wordprocessingml.document'
return out_stream.getvalue()
except Exception as e:
response.flash ='Error open file for reading or download it - ' + filename
return