I have 5 sample images (approx. 500x300) representing letters that can appear in a larger image (approx. 3000x3000). All images (both samples and larger images) are monochrome. The letters always have the same shape, orientation and size as the sample images. Having a 3000x3000 image as input, I would like to get bounding boxes of the letters that appear in it. My idea is to perform the spatial convolution between the input image and the 5 sample images. This is how i perform the convolution between an input and a single sample filter:
import scipy.signal as S
import cv2
sample = cv2.imread(f'<sample_path>', 0) #Ready to use two valued B/W
in_image = cv2.imread(f'f'<image_path>'', 0)
_, in_image = cv2.threshold(in_image, 50, 255, cv2.THRESH_BINARY) # Two valued
kernel1 = cv2.getStructuringElement(cv2.MORPH_RECT, (7, 7))
kernel2 = cv2.getStructuringElement(cv2.MORPH_RECT, (5, 5))
in_image = cv2.erode(in_image, kernel=kernel1) # corrosion
in_image = cv2.dilate(in_image, kernel=kernel2) # inflation
conv = S.convolve2d(in_image, sample )
I would expect to get peak values in conv
where the filter matches the input image (and from this information get the bounding box) but this does not happen. I get no error but the output image seems to contain only noise.