2

The problem is related to using LibreOffice headless converter to automatically convert uploaded files. Getting this error:

LibreOffice 7 fatal error - Application cannot be started

Ubuntu ver: 21.04

What I have tried: Getting the file from Azure Blob storage, put it into BASE_DIR/Input_file, convert it to PDF using Linux command that I am running by subproccess, put it into BASE_DIR/Output_file folder.

Below is my code:

I am installing the LibreOffice to docker this way

RUN apt-get update \
&& ACCEPT_EULA=Y apt-get -y install LibreOffice

The main logic:

blob_client = container_client.get_blob_client(f"Folder_with_reports/")

with open(os.path.join(BASE_DIR, f"input_files/{filename}"), "wb") as source_file:
    source_file.write(data)

source_file = os.path.join(BASE_DIR, f"input_files/{filename}")  # original docs here
output_folder = os.path.join(BASE_DIR, "output_files")   # pdf files will be here

# assign the command of converting files through LibreOffice
command = rf"lowriter --headless --convert-to pdf {source_file} --outdir {output_folder}"

# running the command
subprocess.run(command, shell=True)

# reading the file and uploading it back to Azure Storage
with open(os.path.join(BASE_DIR, f"output_files/MyFile.pdf"), "rb") as outp_file:
    outp_data = outp_file.read()

blob_name_ = f"test"
container_client.upload_blob(name = blob_name_ ,data = outp_data, blob_type="BlockBlob")

Should I install lowriter instead of LibreOffice? Is it okay to use BASE_DIR for this kind of operations? I would appreciate any suggestion.

Orkhan Mammadov
  • 173
  • 1
  • 12

2 Answers2

1

Patial solution:

Here I have simplified the case and created additional docker image with this Dockerfile. I apply both methods: unoconv and straight conversion.

Dockerfile:

FROM ubuntu:21.04

ENV DEBIAN_FRONTEND noninteractive

RUN apt-get update && apt-get -y upgrade && \
    apt-get -y install python3.10 && \
    apt update && apt install python3-pip -y

# Method1 - installing LibreOffice and java
RUN apt-get --no-install-recommends install libreoffice -y
RUN apt-get install -y libreoffice-java-common

# Method2 - additionally installing unoconv
RUN apt-get install unoconv

ARG CACHEBUST=1

ADD BASE.py /code/BASE.py

# copying input doc/docx files to the docker's linux 
COPY /input_files /code/input_files

CMD ["/code/BASE.py"]
ENTRYPOINT ["python3"]

BASE.py

import os
import subprocess

BASE_DIR = "/code"

# subprocess.run("ls code/input_files", shell=True)

for filename in os.listdir('code/input_files'):

    source_file = f"/code/input_files/{filename}"   # original document

    output_filename = os.path.splitext(filename)[0]+".pdf"
    output_file = f"code/output_files/{output_filename}"
    output_folder = "code/output_files"   # pdf files will be here

    # METHOD 1 - LibreOffice straightly
    assign the command of converting files through LibreOffice
    convert_to_pdf = rf"libreoffice --headless --convert-to pdf {source_file} --outdir {output_folder}"
    subprocess.run(r'ls code/output_files/', shell=True)

    ## METHOD 2 - Using unoconv - also working
    # convert_to_pdf = f"unoconv -f pdf {source_file}"
    # subprocess.run(convert_to_pdf, shell=True)
    # print(f'file {filename} converted')

The above mentioned methods allows to work with the problem if files was already in Linux filesystem while building. But still didn't find a way to write files into system after building the docker image.

Orkhan Mammadov
  • 173
  • 1
  • 12
0

I created something similar. It is an API using unoserver and libreoffice to convert files into images for preview / thumbnail reasons. Please have a look here: https://github.com/Nowi5/file-preview-api

Simon
  • 11
  • 3
  • While this link may answer the question, it is better to include the essential parts of the answer here and provide the link for reference. Link-only answers can become invalid if the linked page changes. - [From Review](/review/late-answers/34822511) – doneforaiur Aug 16 '23 at 06:40