2

The borders across the text like in the following image is giving a very bad result for OCR.

enter image description here

enter image description here

So I am using javaCV(java wrapper for OpenCV ) to remove borders and boxes around text in an image. The results were quite satisfactory. But the problem I am facing now is, It is removing the horizontal and vertical lines of text as well like in the following example.

enter image description here

The horizontal lines which were removed are redrawn in a different color.

I am following the following steps to remove borders

  1. Find the horizontal and vertical contours specifying the height and width of contours.
  2. Fill the contours with white color.

I've attached my code snippets below.

public void removeBorder( String filePath )
{
    Mat grayImage = Imgcodecs.imread( filePath, Imgcodecs.IMREAD_GRAYSCALE );
    Mat thresholdInverted = new Mat();
    Imgproc.threshold( grayImage, thresholdInverted, 127.0, 255.0, Imgproc.THRESH_BINARY_INV + Imgproc.THRESH_OTSU );
    Imgcodecs.imwrite( "E:/threholded.jpg", thresholdInverted );


    List<MatOfPoint> horizontalContours = morphOpenAndFindContours( thresholdInverted, new Size( 5, 1 ));


    List<MatOfPoint> verticalContours = morphOpenAndFindContours( thresholdInverted, new Size( 1, 10 ));

    this.drawWhiteContours( verticalContours, grayImage );
    this.drawWhiteContours( horizontalContours, grayImage );
    Imgcodecs.imwrite( "E:/result.jpg", grayImage );
}

private List<MatOfPoint> morphOpenAndFindContours( Mat img, Size kSize)
{
    Mat kernel = Imgproc.getStructuringElement( Imgproc.MORPH_RECT, kSize );

    Mat openedImage = new Mat();
    Imgproc.morphologyEx( img, openedImage, Imgproc.MORPH_OPEN, kernel, new Point( -1, -1 ), 1 );
    Mat dilateKernel = Imgproc.getStructuringElement( Imgproc.MORPH_RECT, new Size( 5, 5 ) );

    Imgproc.dilate( openedImage, openedImage, dilateKernel );

    List<MatOfPoint> contours = new ArrayList<>();

    Imgproc.findContours( openedImage, contours, new Mat(), Imgproc.RETR_EXTERNAL, Imgproc.CHAIN_APPROX_SIMPLE );

    return contours;
}


private void drawWhiteContours( List<MatOfPoint> contours, Mat image )
{
    for ( int i = 0; i < contours.size(); i++ ) {
        Imgproc.drawContours( image, contours, i, new Scalar( 255 ), -1 );
    }
}

So How can I remove only the borders without affecting the text? the solution in Java is preferable but I am okay with python.

Arun Gowda
  • 2,721
  • 5
  • 29
  • 50

1 Answers1

0

I think a more robust approach would be to first detect edges and detect contours.

After this you should find the contours corresponding to the rectangles. To do this you could compare the area of all the contours and find the most common one, which will most probably correspond to the area of the rectangles since they are all the same.

marco romelli
  • 1,143
  • 8
  • 19