I need to capture .doc or .docx files from external sites, convert them to pdf and return the content. To this I add a content-type header, publish through my CMS, cache by CDN, and display within HTML using the Adobe PDF Embed API. I'm using Python 3.7.
As a test, this works:
def generate_pdf():
subprocess.call(['soffice', '--convert-to', 'pdf',
'https://arbitrary.othersite.com/anyfilename.docx'])
sleep(1)
myfile = open('anyfilename.pdf', 'rb')
content = myfile.read()
os.remove('anyfilename.pdf')
return content
This would be nice:
def generate_pdf(url):
result = subprocess.call(['soffice', '--convert-to', 'pdf', url])
content = result
return content
The URLs could include any parameters or illegal characters, which might make it hard to guess the resulting file name. Anyway, it would be preferable not to have to sleep, save, read, and delete the converted file.
Is this possible?