0

What I'm trying to achieve is to compare two images (engineering drawings as pdf) by uploading them using Streamlit 'st.file_uploader()', the code here works well but how can I do the same without Poppler...

import streamlit as st
import imutils
import cv2
import numpy as np
import io

from pdf2image import convert_from_bytes
from io import BytesIO
from PIL import Image

def pdf_to_png_bytes(pdfFile):

    pdfFile = BytesIO(pdfFile.read())
    pdfFile.seek(0)
    imgFile = convert_from_bytes(pdfFile.read(), poppler_path=r"poppler\bin")

    buffer = io.BytesIO()
    imgFile[0].save(buffer, 'png')
    buffer = buffer.getvalue()

    return buffer

After spending several days trying to find an alternative solution, I couldn't do the same the steps without using pdf2image and Poppler components... The reason is that I cannot have Poppler emulated in the docker file of my company..

After a get the png byte string from 'pdf_to_png_bytes', I use the function bellow to compare both images:

def calculate_image_diff(img1, img2):

    original = cv2.imdecode(np.frombuffer(img1, np.uint8), -1)
    new = cv2.imdecode(np.frombuffer(img2, np.uint8), -1)
    
    diff = original.copy()
    cv2.absdiff(original, new, diff)

    gray = cv2.cvtColor(diff, cv2.COLOR_BGR2GRAY)

    for i in range(0, 3):
        dilated = cv2.dilate(gray.copy(), None, iterations = i + 1)

    (T, thresh) = cv2.threshold(dilated, 0, 255, cv2.THRESH_BINARY)    

    cnts = cv2.findContours(thresh, cv2.RETR_LIST, cv2.CHAIN_APPROX_SIMPLE)
    cnts = imutils.grab_contours(cnts)

    for c in cnts:
        (x, y, w, h) = cv2.boundingRect(c)
        cv2.rectangle(new, (x, y), (x + w, y + h), (0, 255, 0), 2)
    
    new = Image.fromarray(new)

    return new

Like I said, this code works perfectly... but I need to find an alternative without using Poppler. Can you help me?

  • Most of pdf to img converter that I searched works with files in some path, but in this case I'm using streamlit and the PDFs are uploaded as BytesIO and that's making me very confused on how to work with it. Paid solutions I think is not an option. – Victor Dias Sep 21 '22 at 14:53

0 Answers0