The following solution is not a perfect, and not generic solution, but I hope it's good enough for your needs.
For removing the line I suggest using cv2.connectedComponentsWithStats
for finding clusters, and mask the wide or long clusters.
The solution uses the following stages:
- Convert image to Grayscale.
- Apply threshold and invert polarity.
Use automatic thresholding by applying flag cv2.THRESH_OTSU
.
- Use "close" morphological operation to close small gaps.
- Find connected components (clusters) with statistics.
- Iterate the clusters, and delete clusters with large width and large height.
Remove very small clusters - considered to be noise.
- The top and left side is cleaned "manually".
Here is the code:
import numpy as np
import cv2
img = cv2.imread('Heshbonit.jpg') # Read input image
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) # Convert to Grayscale.
ret, thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU) # Convert to binary and invert polarity
# Use "close" morphological operation to close small gaps
thresh = cv2.morphologyEx(thresh, cv2.MORPH_CLOSE, np.array([1, 1]));
thresh = cv2.morphologyEx(thresh, cv2.MORPH_CLOSE, np.array([1, 1]).T);
nlabel,labels,stats,centroids = cv2.connectedComponentsWithStats(thresh, connectivity=8)
thresh_size = 100
# Delete all lines by filling wide and long lines with zeros.
# Delete very small clusters (assumes to be noise).
for i in range(1, nlabel):
#
if (stats[i, cv2.CC_STAT_WIDTH] > thresh_size) or (stats[i, cv2.CC_STAT_HEIGHT] > thresh_size):
thresh[labels == i] = 0
if stats[i, cv2.CC_STAT_AREA] < 4:
thresh[labels == i] = 0
# Clean left and top margins "manually":
thresh[:, 0:30] = 0
thresh[0:10, :] = 0
# Inverse polarity
thresh = 255 - thresh
# Write result to file
cv2.imwrite('thresh.png', thresh)