0

Currently I am working on the Vietnamese text detection from the image

So, to detect the text in Image I am using the PaddleOcr Detection because I need line to line detection. Paddleocr is showing 100% result for that, we can do recognition using PaddleOcr has PaddleOcr didn't trained on Vietnamese it's not giving 100% results.

So, for recognition I am going with vietocr and its showing 100% results but the problem with vietocr is that it is only working when we pass cropped image not on full images.

My plan to crop the Image into multiple by using Bounding box co-ordinates generated from PaddleOcr

I am using PaddleOcr for the text detection

Sample Code 

from paddleocr import PaddleOCR,draw_ocr
# Paddleocr supports Chinese, English, French, German, Korean and Japanese.
# You can set the parameter `lang` as `ch`, `en`, `french`, `german`, `korean`, `japan`
# to switch the language model in order.
ocr = PaddleOCR(use_angle_cls=True) # need to run only once to download and load model into memory
img_path = '/content/im1502.jpg'
result = ocr.ocr(img_path, cls=True)
resultss = result
for idx in range(len(result)):
    res = result[idx]
    for line in res:
        print(line)

# draw result
from PIL import Image
result = result[0]
results = result
image = Image.open(img_path).convert('RGB')
boxes = [line[0] for line in result]
txts = [line[1][0] for line in result]
scores = [line[1][1] for line in result]
im_show = draw_ocr(image, boxes, txts=None, scores=None, font_path='/path/to/PaddleOCR/doc/fonts/simfang.ttf')
im_show = Image.fromarray(im_show)
im_show.save('result.jpg')

Bounding Box Image and Results

[[[108.0, 78.0], [289.0, 93.0], [286.0, 131.0], [104.0, 116.0]],
 [[51.0, 230.0], [267.0, 235.0], [266.0, 272.0], [50.0, 267.0]],
 [[17.0, 304.0], [343.0, 304.0], [343.0, 340.0], [17.0, 340.0]]]

Recognition with VietOcr

import matplotlib.pyplot as plt
from PIL import Image

from vietocr.tool.predictor import Predictor
from vietocr.tool.config import Cfg
config = Cfg.load_config_from_name('vgg_transformer')

# config['weights'] = './weights/transformerocr.pth'
#config['weights'] = 'https://drive.google.com/uc?id=13327Y1tz1ohsm5YZMyXVMPIOjoOA0OaA'
config['cnn']['pretrained']=False
#config['device'] = 'cuda:0'
config['predictor']['beamsearch']=False

detector = Predictor(config)

img = '/content/im1502.jpg'
img = Image.open(img)
plt.imshow(img)
result = detector.predict(img)
result

The result is

SẢNH CHUNG

So can anyone help me with how to crop the image using paddleocr bounding box coordinates

Christoph Rackwitz
  • 11,317
  • 4
  • 27
  • 36
ram_98
  • 19
  • 6

2 Answers2

1

Thanks, I figured it out. Here boxes are the Bbox from paddelocr

box = np.array(boxes).astype(np.int32).reshape(-1, 2)

img = cv2.imread(img_path)
height = img.shape[0]
width = img.shape[1]

mask = np.zeros((height, width), dtype=np.uint8)
New_list = boxes.copy()

for boxs in New_list:
  box=np.array(boxs).astype(np.int32).reshape(-1, 2)
  points = np.array([box])
  cv2.fillPoly(mask, points, (255))
  res = cv2.bitwise_and(img,img,mask = mask)
  rect = cv2.boundingRect(points) # returns (x,y,w,h) of the rect
  cropped = res[rect[1]: rect[1] + rect[3], rect[0]: rect[0] + rect[2]]

The cropped will be part of a img of bounding box

ram_98
  • 19
  • 6
0

For more information, paddleocr has one function to crop image with four coordinates. This function based on green's theory

def get_rotate_crop_image(img, points):
# Use Green's theory to judge clockwise or counterclockwise
# use paddleocr rotate image function
# author: biyanhua
d = 0.0
for index in range(-1, 3):
    d += -0.5 * (points[index + 1][1] + points[index][1]) * (
        points[index + 1][0] - points[index][0])
if d < 0:  # counterclockwise
    tmp = np.array(points)
    points[1], points[3] = tmp[3], tmp[1]

try:
    img_crop_width = int(
        max(
            np.linalg.norm(np.array(points[0]) - np.array(points[1])),
            np.linalg.norm(np.array(points[2]) - np.array(points[3]))))
    img_crop_height = int(
        max(
            np.linalg.norm(np.array(points[0]) - np.array(points[3])),
            np.linalg.norm(np.array(points[1]) - np.array(points[2]))))
    pts_std = np.float32([[0, 0], [img_crop_width, 0],
                          [img_crop_width, img_crop_height],
                          [0, img_crop_height]])
    M = cv2.getPerspectiveTransform(np.float32(points), pts_std)
    dst_img = cv2.warpPerspective(
        img,
        M, (img_crop_width, img_crop_height),
        borderMode=cv2.BORDER_REPLICATE,
        flags=cv2.INTER_CUBIC)
    dst_img_height, dst_img_width = dst_img.shape[0:2]
    if dst_img_height * 1.0 / dst_img_width >= 1.5:
        dst_img = np.rot90(dst_img)
    return dst_img
except Exception as e:
    print(e)