Get a bounding box from a convolution between two images in python

Question

I have 5 sample images (approx. 500x300) representing letters that can appear in a larger image (approx. 3000x3000). All images (both samples and larger images) are monochrome. The letters always have the same shape, orientation and size as the sample images. Having a 3000x3000 image as input, I would like to get bounding boxes of the letters that appear in it. My idea is to perform the spatial convolution between the input image and the 5 sample images. This is how i perform the convolution between an input and a single sample filter:

import scipy.signal as S
import cv2

sample = cv2.imread(f'<sample_path>', 0) #Ready to use two valued B/W
in_image = cv2.imread(f'f'<image_path>'', 0) 
_, in_image = cv2.threshold(in_image, 50, 255, cv2.THRESH_BINARY) # Two valued
kernel1 = cv2.getStructuringElement(cv2.MORPH_RECT, (7, 7)) 
kernel2 = cv2.getStructuringElement(cv2.MORPH_RECT, (5, 5))
in_image = cv2.erode(in_image, kernel=kernel1) # corrosion 
in_image = cv2.dilate(in_image, kernel=kernel2) # inflation
conv = S.convolve2d(in_image, sample )

I would expect to get peak values in conv where the filter matches the input image (and from this information get the bounding box) but this does not happen. I get no error but the output image seems to contain only noise.

`sample`, `conv` and `in_image` are `uint8`. I just realised that I have `255` for black and `0` for white. Perhaps I need to turn white into `+1` and black into `-1` or vice versa, both for overflow and matching reasons, as letters can appear black on white and vice versa. — Lorenzo Epifani, Nov 16 '21 at 18:33
EDIT *`sample`, `conv` and `in_image` have `uint8` as `dtype` — Lorenzo Epifani, Nov 16 '21 at 18:45

Get a bounding box from a convolution between two images in python

0 Answers0