2

I'm working on a project about recognizing moroccan license plates which look like this image :

Moroccan License Plate

Moroccan License Plate

Please how can I use OpenCV to cut the license plate out and Tesseract to read the numbers and arabic letter in the middle.

I have looked into this research paper : https://www.researchgate.net/publication/323808469_Moroccan_License_Plate_recognition_using_a_hybrid_method_and_license_plate_features

I have installed OpenCV and Tesseract for python in Windows 10. When I run the tesseract on the text only part of the license plate using "fra" language I get 7714315l Bv. How can I separate the data?

Edit: The arabic letters we use in Morocco are : أ ب ت ج ح د هـ The expected result is : 77143 د 6 The vertical lines are irrelevant, I have to use them to separate the image and read data separately.

Thanks in advance!

Soufiane S
  • 197
  • 1
  • 4
  • 16
  • What is the expected result? – Rick M. Feb 18 '19 at 08:28
  • I want to read the numbers apart and the letter in the middle apart using "ara" language in tesseract. How can I separate the data using OpenCV? – Soufiane S Feb 18 '19 at 08:30
  • As far as I understand correctly, the _two straight vertical line segments in the middle_ are irrelevant right? – Rick M. Feb 18 '19 at 08:35
  • Yes I need to remove them, but before I have to use them to separate/"get 3 images" (cv2 objects). Then read/ocr them with tesseract. – Soufiane S Feb 18 '19 at 08:37

2 Answers2

2

You can use HoughTransform since the two vertical lines are irrelevant, to crop the image:

import numpy as np
import cv2

image = cv2.imread("lines.jpg")
grayImage = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

dst = cv2.Canny(grayImage, 0, 150)
cv2.imwrite("canny.jpg", dst)

lines = cv2.HoughLinesP(dst, 1, np.pi / 180, 50, None, 60, 20)

lines_x = []
# Get height and width to constrain detected lines
height, width, channels = image.shape
for i in range(0, len(lines)):
    l = lines[i][0]
    # Check if the lines are vertical or not
    angle = np.arctan2(l[3] - l[1], l[2] - l[0]) * 180.0 / np.pi
    if (l[2] > width / 4) and (l[0] > width / 4) and (70 < angle < 100):
        lines_x.append(l[2])
        # To draw the detected lines
        #cv2.line(image, (l[0], l[1]), (l[2], l[3]), (0, 0, 255), 3, cv2.LINE_AA)

#cv2.imwrite("lines_found.jpg", image)
# Sorting to get the line with the maximum x-coordinate for proper cropping
lines_x.sort(reverse=True)
crop_image = "cropped_lines"
for i in range(0, len(lines_x)):
    if i == 0:
        # Cropping to the end
        img = image[0:height, lines_x[i]:width]
    else:
        # Cropping from the start
        img = image[0:height, 0:lines_x[i]]
    cv2.imwrite(crop_image + str(i) + ".jpg", img)

Last segment

First segment

I am sure you know now how to get the middle part ;) Hope it helps!

EDIT:

Using some morphological operations, you can also extract the characters individually:

import numpy as np
import cv2

image = cv2.imread("lines.jpg")
grayImage = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

dst = cv2.Canny(grayImage, 50, 100)

dst = cv2.morphologyEx(dst, cv2.MORPH_RECT, np.zeros((5,5), np.uint8), 
                       iterations=1)
cv2.imwrite("canny.jpg", dst)

im2, contours, heirarchy = cv2.findContours(dst, cv2.RETR_EXTERNAL, 
                                            cv2.CHAIN_APPROX_NONE)

for i in range(0, len(contours)):
    if cv2.contourArea(contours[i]) > 200:
        x,y,w,h = cv2.boundingRect(contours[i])
        # The w constrain to remove the vertical lines
        if w > 10:
            cv2.rectangle(image, (x, y), (x+w, y+h), (0, 0, 255), 1)
            cv2.imwrite("contour.jpg", image)

Result:

contour result

Rick M.
  • 3,045
  • 1
  • 21
  • 39
  • Thank you so much! I'm new to OpenCV, please guide me on how to crop the middle part, it's the one in arabic but I don't know how to do it. – Soufiane S Feb 18 '19 at 09:38
  • 1
    You can crop the part from x coordinate of the second image to the x coordinate of the first image, the x-coordinate is in lines_x and i is the position (0 for first image, 1 for second image) i.e., `middle_part = image[0:height, lines_x[1]:lines_x[0]` – Rick M. Feb 18 '19 at 09:42
  • It works! Now the only thing that remains is adding this code to the OpenCV (license plate detection) part I have and apply tesseract. I have used this code: https://github.com/MicrocontrollersAndMore/OpenCV_3_License_Plate_Recognition_Python also add a "]" to your code. – Soufiane S Feb 18 '19 at 09:52
  • When I applied your code in a smaller picture it didn't work... `lines` is empty.. – Soufiane S Feb 18 '19 at 10:16
  • Can you add a link to the image in the comments? I can come up with a more robust method – Rick M. Feb 18 '19 at 10:22
  • I tried also with contours, probably they are more robust than houghlines – Rick M. Feb 18 '19 at 11:55
  • In the github page I mentioned before they used contours to find the License Plate in the beginning. So they should be more robust, please, can you provide a sample code? – Soufiane S Feb 18 '19 at 11:58
  • Yes sure, I am working on the same constrains as the one I used here to find the two vertical lines. – Rick M. Feb 18 '19 at 12:01
  • Thank you for your time sir. – Soufiane S Feb 18 '19 at 12:03
  • Adding this: `dst = cv2.morphologyEx(dst, cv2.MORPH_CLOSE, np.ones((1,5), np.uint8), iterations=1)` after `Canny` and using `lines = cv2.HoughLinesP(dst, 1, np.pi / 360, 20, None, 40, 10)` instead works! – Rick M. Feb 18 '19 at 12:19
  • Using contours, you can actually extract all the characters (including the two vertical lines) individually. – Rick M. Feb 18 '19 at 12:20
  • Your code works perfectly, how can I extract all the characters using the contours? – Soufiane S Feb 18 '19 at 12:29
  • What version of OpenCV are you using? I'm using version 4.0.0 and it gave me slightly different result. – Soufiane S Feb 18 '19 at 12:48
  • 3.4.3, slightly different results? Shouldn't happen – Rick M. Feb 18 '19 at 12:49
  • https://i.stack.imgur.com/QJvFv.jpg I will try to install version 3.4.3 – Soufiane S Feb 18 '19 at 12:57
  • Version 3.4.3 works like a charm, thank you very much sir! I appreciate your help! – Soufiane S Feb 18 '19 at 13:07
2

This what I achieved by now...

original detected cropped thresh clean

The detection on second image was made by using the code found here: License plate detection with OpenCV and Python

Full code (which work from the third image an on) is this:

import cv2
import numpy as np
import tesserocr as tr
from PIL import Image

image = cv2.imread("cropped.png")

gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
cv2.imshow('gray', image)

thresh = cv2.adaptiveThreshold(gray, 250, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 255, 1)
cv2.imshow('thresh', thresh)

kernel = np.ones((1, 1), np.uint8)
img_dilation = cv2.dilate(thresh, kernel, iterations=1)

im2, ctrs, hier = cv2.findContours(img_dilation.copy(), cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)

sorted_ctrs = sorted(ctrs, key=lambda ctr: cv2.boundingRect(ctr)[0])

clean_plate = 255 * np.ones_like(img_dilation)

for i, ctr in enumerate(sorted_ctrs):
    x, y, w, h = cv2.boundingRect(ctr)

    roi = img_dilation[y:y + h, x:x + w]

    # these are very specific values made for this image only - it's not a factotum code
    if h > 70 and w > 100:
        rect = cv2.rectangle(image, (x, y), (x + w, y + h), (0, 255, 0), 2)

        clean_plate[y:y + h, x:x + w] = roi
        cv2.imshow('ROI', rect)

        cv2.imwrite('roi.png', roi)

img = cv2.imread("roi.png")

blur = cv2.medianBlur(img, 1)
cv2.imshow('4 - blur', blur)

pil_img = Image.fromarray(cv2.cvtColor(img, cv2.COLOR_BGR2RGB))

api = tr.PyTessBaseAPI()

try:
    api.SetImage(pil_img)
    boxes = api.GetComponentImages(tr.RIL.TEXTLINE, True)
    text = api.GetUTF8Text()

finally:
    api.End()

# clean the string a bit
text = str(text).strip()

plate = ""

# 77143-1916 ---> NNNNN|symbol|N
for char in text:
    firstSection = text[:5]

    # the arabic symbol is easy because it's nearly impossible for the OCR to misunderstood the last 2 digit
    # so we have that the symbol is always the third char from the end (right to left)
    symbol = text[-3]

    lastChar = text[-1]

    plate = firstSection + "[" + symbol + "]" + lastChar

print(plate)
cv2.waitKey(0)

For arabic symbols you should install additional languages from TesseractOCR (and possibly use the version 4 of it).

Output: 77143[9]6

The number between brackets is the arabic symbol (undetected).

Hope I helped you.

lucians
  • 2,239
  • 5
  • 36
  • 64
  • Thank you very much for the answer, really appreciate your help! – Soufiane S Feb 18 '19 at 20:54
  • I am using pytesseract latest version 0.2.6 and it fails to read the letter `د` in the license plate... How can I solve this? – Soufiane S Feb 18 '19 at 20:58
  • Well, take a look at the [main github page](https://github.com/madmaze/pytesseract#usage), download the necessary tessdata language and set the right flag in your code. – lucians Feb 18 '19 at 21:00
  • Sure, I already have `tessdata/ara.traineddata` and the code `print(pytesseract.image_to_string(Image.open('lines.jpg'), lang='ara'))` returns wrong answer... – Soufiane S Feb 18 '19 at 21:03
  • Invoke @Kinght 金 - He's a master with OpenCv. – lucians Feb 18 '19 at 21:05
  • Ok dear friend, but how can I invoke him? Is he in twitter or in here? If he is here how can I talk to him? – Soufiane S Feb 18 '19 at 21:09
  • Try to quote him in the question.... I don't know, if he want to answer good, if not go ahead with your task. Keep in mind that ANPR it's not so easy and most of the code you will find on internet need to be fine-tuned for your images. – lucians Feb 18 '19 at 21:13
  • I have installed tesserocr using the `tesserocr-2.4.0-cp36-cp36m-win32.whl` package and was successful, when I run your code it gives an error in opencv... What version did you use? – Soufiane S Feb 19 '19 at 10:03