how to detect words in an image with OpenCV and Tesseract properly

Question

I'm working on an application which reads an image file with OpenCV and processes the words on it with Tesseract. With the following code Tesseract detects extra rectangles which don't contain text.

void Application::Application::OpenAndProcessImageFile(void)
{
    OPENFILENAMEA ofn;
    ZeroMemory(&ofn, sizeof(OPENFILENAMEA));

    char szFile[260] = { 0 };
    // Initialize remaining fields of OPENFILENAMEA structure
    ofn.lStructSize     = sizeof(ofn);
    ofn.hwndOwner       = mWindow->getHandle();
    ofn.lpstrFile       = szFile;
    ofn.nMaxFile        = sizeof(szFile);
    ofn.lpstrFilter     = "JPG\0*.JPG\0PNG\0*.PNG\0";
    ofn.nFilterIndex    = 1;
    ofn.lpstrFileTitle  = NULL;
    ofn.nMaxFileTitle   = 0;
    ofn.lpstrInitialDir = NULL;
    ofn.Flags           = OFN_PATHMUSTEXIST | OFN_FILEMUSTEXIST;

    //open the picture dialog and select the image
    if (GetOpenFileNameA(&ofn) == TRUE) {
        std::string filePath = ofn.lpstrFile;
        
        //load image
        mImage = cv::imread(filePath.c_str());

        //process image     
        tesseract::TessBaseAPI ocr = tesseract::TessBaseAPI();

        ocr.Init(NULL, "eng");
        ocr.SetImage(mImage.data, mImage.cols, mImage.rows, 3, mImage.step);

        Boxa* bounds = ocr.GetWords(NULL);
        for (int i = 0; i < bounds->n; ++i) {
            Box* b = bounds->box[i];
            cv::rectangle(mImage, { b->x,b->y,b->w,b->h }, { 0, 255, 0 }, 2);
        }

        ocr.End();
        
        //show image
        cv::destroyAllWindows();
        cv::imshow("İşlenmiş Resim", mImage);
    }
}

And here is the output image

As you can see Tesseract processes areas which don't contain words at all. How can I fix this?

score 3 · Accepted Answer · answered Nov 18 '21 at 13:26

3

Tesseract is based on character recognition more than text detection. Even there is no text in some areas tesseract can see some features as a text.

What you need to do is that using a text detection algorithm to detect text areas first and then apply tesseract. Here is a tutorial for a dnn model for text detection which is really good.

I quickly applied your image to this and here is the output:

You can get more better results by changing the input parameters of the model. I just used the default ones.

answered Nov 18 '21 at 13:26

Yunus Temurlenk

4,085
4
18
39

thanks for the answer. Can you explain what are the confThreshold and nmsThreshold arguments for? – Caner Kurt Nov 18 '21 at 16:18
its a score_threshold actually, check [here](https://docs.opencv.org/3.4/d6/d0f/group__dnn.html#ga9d118d70a1659af729d01b10233213ee) – Yunus Temurlenk Nov 18 '21 at 17:37

how to detect words in an image with OpenCV and Tesseract properly

1 Answers1

Linked