Text Documents Image Alignment

Question

I am trying different image alignment approaches to align the images containing texts using Computer Vision. I have tested following image alignment approaches:

Probabilistic Houghlines Transform to align images according to the detected lines. https://medium.com/p/97b61eeffb20 is my implementation. But that didn't help me as expected.
Implemented SIFT and ORB to detect and align images according to the template image but instead of aligning all images, it distorts the image sometimes. I have used https://pyimagesearch.com/2020/08/31/image-alignment-and-registration-with-opencv/ as a reference.
Edge detection followed contour detection, corner detection and perspective transformation. But it doesn't work with images having different background types. This is the reference example https://pyimagesearch.com/2014/09/01/build-kick-ass-mobile-document-scanner-just-5-minutes/
morphology followed by contour detection and masking. Reference Crop exactly document paper from image
Trained the YOLO(You only look once) object detector to detect the documents but it detects the bounding box, my requirement is Quardilaterl with four image corners from which I can align documents using perspective transform.
Calculating the skewness and deskewing. Reference: https://github.com/sbrunner/deskew

But I couldn't align the document(identity documents such as citizenship, passport, license etc) images with different backgrounds perfectly using the above approaches.

This is a sample test image(important information are hidden due to privacy issue).

Is there are any other approaches of image alignment which can align the document images perfectly by correcting the skewness of the available text. My main focus is to extract the information form document using OCR preserving the information sequence in the document image. Thank you!

If you want to make that image straight you can: 1) load image, grayscale, gaussian blur, otsu's threshold, find contours, find rotated bounding rect, then perform 4 point perspective transform to obtain a birds-eye view of the image. 2) second approach is find corner points with shi tomasi corner detection then perspective transform — nathancy, May 04 '22 at 08:52

score 0 · Answer 1 · answered May 04 '22 at 08:37

To me, the third approach seems to be the most promising. But as you said, a cluttered background is a problem. Two ideas came to me about this:

Implementing a GUI as a fallback solution, so the user could select the contour.
Render some artificial dataset of official documents against a cluttered background and train a CNN to predict a segmentation map of the document. This map could be used then, as an initialization for the edge detection / contour detection. This answer contains two links to databases of images of official documents. Maybe these are of some use for you.

Text Documents Image Alignment

1 Answers1