1

I'm trying to remove the square boxes(vertical and horizontal lines) from a filled out form using opencv (Python). I am trying to detect the vertical and horizontal lines through morphological operations of OpenCV. The original image

After detecting the Vertical and Horizontal lines. Horizontal lines

Vertical Lines result of vertical lines

After the horizontal and vertical lines are detected , i am simply adding them and subtracting it from processed image. res = verticle_lines_img + horizontal_lines_img exp = img_bin - res

The final results is not so smoothed as expected. Final image after removing H and V lines

The full code for this is

# Read the image
img_for_box_extraction_path='aligned_filled.jpg'
img = cv2.imread(img_for_box_extraction_path, 0)
# Thresholding the image
(thresh, img_bin) = cv2.threshold(img, 128, 255,cv2.THRESH_BINARY|     
cv2.THRESH_OTSU)
# Invert the image
img_bin = ~img_bin
cv2.imwrite("Image_bin.jpg",img_bin)
bw = cv2.adaptiveThreshold(img_bin, 255, cv2.ADAPTIVE_THRESH_MEAN_C, \
                            cv2.THRESH_BINARY, 15, -2)
horizontal = np.copy(bw)
vertical = np.copy(bw)
# Defining a kernel length for horizontal and vertical 
cols = horizontal.shape[1]


horizontal_size = int(cols)
horizontalStructure = cv2.getStructuringElement(cv2.MORPH_RECT, 
(horizontal_size, 1))
# Apply morphology operations
horizontal = cv2.erode(horizontal, horizontalStructure)
horizontal = cv2.dilate(horizontal, horizontalStructure)
rows = vertical.shape[0]

verticalsize = int(rows)
# Create structure element for extracting vertical lines through morphology 
operations
verticalStructure = cv2.getStructuringElement(cv2.MORPH_RECT, (1, 
verticalsize))
# Apply morphology operations
vertical = cv2.erode(vertical, verticalStructure)
vertical = cv2.dilate(vertical, verticalStructure)
#kernel_length = np.array(img).shape[1]//80
#kernel_length = 7
# A verticle kernel of (1 X kernel_length =6), which will detect all the 
verticle lines from the image.
verticle_kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (1, 6))
# A horizontal kernel of (kernel_length=7 X 1), which will help to detect 
all the horizontal line from the image.
hori_kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (7, 1))
# A kernel of (3 X 3) ones.
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (3, 3))


# Morphological operation to detect vertical lines from an image
img_temp1 = cv2.erode(img_bin, verticle_kernel, iterations=3)
verticle_lines_img = cv2.dilate(img_temp1, verticle_kernel, iterations=2)
cv2.imwrite("verticle_lines.jpg",verticle_lines_img)
# Morphological operation to detect horizontal lines from an image


img_temp2 = cv2.erode(img_bin, hori_kernel, iterations=3)
horizontal_lines_img = cv2.dilate(img_temp2, hori_kernel, iterations=2)
cv2.imwrite("horizontal_lines.jpg",verticle_lines_img)


res = verticle_lines_img + horizontal_lines_img
#fin = cv2.bitwise_and(img_bin, img_bin, mask = cv2.bitwise_not(res))
exp = img_bin - res
exp = ~exp
cv2.imwrite("final.jpg",exp)

What could be a novel way to detect and remove the square boxes?

IamKarim1992
  • 646
  • 5
  • 20
  • Once you've detected the lines, dilate the detected lines to make them thicker then use this image as mask to turn that portion of the pixels in the original image to white. – zindarod Jun 12 '19 at 09:29
  • @zindarod in the code cv2.dilate() i have used it for both horizontal and vertical when detected and used that as mask. Could you show some example. – IamKarim1992 Jun 12 '19 at 09:33
  • Just thinking about one point to consider: letters like 'L', 'I' and similars could be detected and deleted by your method, probably you need something to verify the lines found by your kernels – Mauro Dorni Jun 12 '19 at 09:44
  • ive managed to get this far, only problem is its removing a bunch of stuff inside boxes atm https://imgur.com/a/IVKOiRc – chris Jun 12 '19 at 11:36
  • @chris, what methods did u use to get to this. – IamKarim1992 Jun 12 '19 at 16:16
  • don't have the code anymore but from memory: you don't need the second threshold, make the kernels (1,8) (8,1), and reduce iterations in img_temp1/2 to 2. you could also try using the houghlinesp method to remove lines, there you can easily get line length and angle which might make it more accurate as to what you delete – chris Jun 13 '19 at 05:54
  • @chris could you point out to houghlinesp implementation for detecting and removing lines – IamKarim1992 Jun 13 '19 at 11:27
  • please search the opencv docs and try extend the example yourself. they show you how to draw lines, so you can draw them white, and then all you need to do is filter lines based on length / angle etc to make it as accurate as you want – chris Jun 13 '19 at 13:20
  • could you perhaps update the links? – DaveTheAl Dec 27 '19 at 08:19

1 Answers1

1

The grid lines are thinner than the text, so I suggest the following:

threshold->erode->remove small blobs->dilate

Here is the result of the above described method: enter image description here

I feel bad to keep providing example code in the wrong language, but here is what generated that result in C++. I think the function calls should be pretty similar in python. A note on the blob remove in particular (How to remove small connected objects using OpenCV) this guy does it in python and it is WAAAY cleaner than mine, so I suggest you reference that to remove your small blobs. I removed anything less than 15 px which was super arbitrary and first thing i tried. I may have killed some characters (didn't check) with that high of a limit, so you will want to find the right value for your purposes.

int main(int argc, char** argv)
{
    Mat image = imread("../../resources/images/fullForm.jpg", CV_LOAD_IMAGE_GRAYSCALE);

    Mat thresholded, errodedImage, openedImage;
    threshold(image, thresholded, 200, 255, THRESH_BINARY_INV);

    //errode first
    erode(thresholded, errodedImage, getStructuringElement(MORPH_CROSS, Size(3, 3)), cv::Point(-1, -1), 1);

    //delete any blobs with less than 15 px
    Mat labels, stats, centroids;
    Mat deblobbedImage = errodedImage.clone();
    int nccomps = connectedComponentsWithStats(errodedImage, labels, stats, centroids);
    std::vector<int> smallBlobs = std::vector<int>();
    for (int i = 0; i < nccomps; i++)
    {
        if (stats.at<int>(i, CC_STAT_AREA) < 15)
        {
            smallBlobs.push_back(0);
        }
        else
        {
            smallBlobs.push_back(1);
        }
    }

    for (int y = 0; y < errodedImage.rows; y++)
    {
        for (int x = 0; x < errodedImage.cols; x++)
        {
            int label = labels.at<int>(y, x);
            CV_Assert(0 <= label && label <= nccomps);
            if (smallBlobs[label] == 0)
            {
                deblobbedImage.at<uchar>(y, x) = 0;
            }
        }
    }

    //dilate to restore text
    dilate(deblobbedImage, openedImage, getStructuringElement(MORPH_CROSS, Size(3, 3)), cv::Point(-1, -1), 1);

    imshow("source", image);
    imshow("Thresholded", thresholded);
    imshow("erroded", errodedImage);
    imshow("deblobbed", deblobbedImage);
    imshow("finished", openedImage);
    waitKey(0);
    return 0;
}
Sneaky Polar Bear
  • 1,611
  • 2
  • 17
  • 29