You can edit directly in C:\Python3.10.0\Lib\site-packages\paddleocr\ppocr\utils\utility.py
From line 93:
with fitz.open(img_path) as pdf:
for pg in range(0, pdf.page_count):
page = pdf[pg]
mat = fitz.Matrix(2, 2)
pm = page.get_pixmap(matrix=mat, alpha=False)
# if width or height > 2000 pixels, don't enlarge the image
if pm.width > 2000 or pm.height > 2000:
pm = page.get_pixmap(matrix=fitz.Matrix(1, 1), alpha=False)
img = Image.frombytes("RGB", [pm.width, pm.height], pm.samples)
img = cv2.cvtColor(np.array(img), cv2.COLOR_RGB2BGR)
imgs.append(img)
return imgs, False, True
I changed camelCases to snake_case mentioned below:
pageCount -> page_count ,
getPixmap -> get_pixmap
You can also refer to this link : https://github.com/PaddlePaddle/PaddleOCR/discussions/8972