I have a small AWS EC2 instance with a 2GB memory. I'm trying to convert a pptx file to pdf using unoconv and libreoffice. The code work on my local machine but when I deploy it to AWS it converts only files less than 20MB and dies when the file is greater. I'm trying to find a method I can convert In chunks so that the server does not have to read all the file at once. NOTE that this program is being called using Axios so the response must be sent in a way that axios should be able to stream and save using fs.createWriteStream.
Here is my code
chunk_size = 4096 # Adjust this chunk size as needed
@app.route('/convert-pptx', methods=['POST'])
def convert_pptx_file():
try:
uploaded_file = request.files['document']
file_path = f'uploaded_{secrets.token_hex(8)}.pptx'
new_name = file_path.replace(".pptx", ".pdf")
uploaded_file.save(file_path)
# Use subprocess.Popen for streaming conversion
command = ['unoconv', '--format=pdf', file_path]
with subprocess.Popen(command, stdout=subprocess.PIPE, bufsize=chunk_size) as process:
def generate_pdf_from_pptx():
while True:
chunk = process.stdout.read(chunk_size)
if not chunk:
break
yield chunk
response = Response(
generate_pdf_from_pptx(),
content_type='application/pdf',
headers={'Content-Disposition': f'attachment; filename={new_name}'}
)
# Clean up
os.remove(file_path)
atexit.register(lambda: os.remove(new_name))
return response
except Exception as e:
return jsonify({'error': str(e)}), 500
I'm expecting to be able to convert files greater than 20MB but the program hangs and when I stop t it, I get the error.
uno.RuntimeException: Binary URP bridge disposed during call