Remove colors from image to retain only texts

Question

A white background image like below, some texts in black under (extra) red background, some texts are red. The position of texts (no matter with background or not) are not fixed.

I want to reproduce an image with only the texts.

A way I think of is to replace the red background into white, but inevitably the red texts will be gone too.

Here is what I have tried:

from PIL import Image

import numpy as np

orig_color = (255,0,0)
replacement_color = (255,255,255)
img = Image.open("C:\\TEM\\AB.png").convert('RGB')
data = np.array(img)
data[(data == orig_color).all(axis = -1)] = replacement_color
img2 = Image.fromarray(data, mode='RGB')
img2.show()

Result as below:

What's the best way to keep only all the texts of the picture? (ideal as below)

Thank you.

Is it always black and red? Or could there be other colours? — Mark Setchell, Dec 05 '19 at 09:03
How about getting the colour of the top-left corner pixel. If it is red, make red pixels white and leave the (assumed) black letters black. If it is black, make black pixels into white and red pixels into black. This strategy assumes your letters don't ever touch the corners - i.e. that the top-left corner pixel is the background colour. — Mark Setchell, Dec 05 '19 at 09:08
@MarkSetchell, forgot to tell the image is of white background. so the top left conor is white... — Mark K, Dec 05 '19 at 09:22
Ok, can you please post separate, representative images that cover all your cases please? Then we know what we need to deal with. Thank you. — Mark Setchell, Dec 05 '19 at 09:25
If you can detect the red rectangular regions containing black text you can remove the red around them in that area. Afterwards you can replace all the red remaining in the image (i.e. the red text) with black. — martineau, Dec 05 '19 at 10:01

Jonathan Feenstra · Accepted Answer · 2019-12-15T09:56:26.987

Here is my approach using only the red and green channels of the image (using OpenCV, see my comments in the code for the explanation):

import cv2
import imageio
import numpy as np

# extract red and green channel from the image
r, g = cv2.split(imageio.imread('https://i.stack.imgur.com/bMSzZ.png'))[:2]

imageio.imsave('r-channel.png', r)
imageio.imsave('g-channel.png', g)

# white image as canvas for drawing contours
canvas = np.ones(r.shape, np.uint8) * 255

# find contours in the inverted green channel 
# change [0] to [1] when using OpenCV 3, in which contours are returned secondly
contours = cv2.findContours(255 - g, cv2.RETR_LIST, cv2.CHAIN_APPROX_SIMPLE)[0]

# filter out contours that are too large and have length 4 (rectangular)
contours = [
    cnt for cnt in contours
    if cv2.contourArea(cnt) <= 500 and len(cnt) == 4
]

# fill kept contours with black on the canvas
cv2.drawContours(canvas, contours, -1, 0, -1)

imageio.imsave('filtered-contours.png', canvas)

# combine kept contours with red channel using '&' to bring back the "AAA"
# use '|' with the green channel to remove contour edges around the "BBB"
result = canvas & r | g

imageio.imsave('result.png', result)

r-channel.png

g-channel.png

filtered-contours.png

result.png

Update

Here is a more generalisable solution based on another example image you provided in the chat:

import cv2
import numpy as np

img = cv2.imread('example.png')

result = np.ones(img.shape[:2], np.uint8) * 255
for channel in cv2.split(img):
    canvas = np.ones(img.shape[:2], np.uint8) * 255
    contours = cv2.findContours(255 - channel, cv2.RETR_LIST,
                                cv2.CHAIN_APPROX_SIMPLE)[0]
    # size threshold may vary per image
    contours = [cnt for cnt in contours if cv2.contourArea(cnt) <= 100]
    cv2.drawContours(canvas, contours, -1, 0, -1)
    result = result & (canvas | channel)

cv2.imwrite('result.png', result)

Here I no longer filter on contour length, as this causes problems when other characters are touching the rectangle. All channels of the image are used to make it compatible with different colours.

thank you. When I run the code, it gives me error as below, is it because of my cv2 version 3.1.0? — Mark K, Dec 06 '19 at 03:45
"OpenCV Error: Assertion failed (npoints >= 0 && (depth == CV_32F || depth == CV_32S)) in cv::contourArea, file C:\builds\master_PackSlaveAddon-win32-vc12-static\opencv\modules\imgproc\src\shapedescr.cpp, line 314 Traceback (most recent call last): File "D:\Python27\script.py", line 34, in if not (cv2.contourArea(cnt) > 500 and len(cnt) == 4) cv2.error: C:\builds\master_PackSlaveAddon-win32-vc12-static\opencv\modules\imgproc\src\shapedescr.cpp:314: error: (-215) npoints >= 0 && (depth == CV_32F || depth == CV_32S) in function cv::contourArea" — Mark K, Dec 06 '19 at 03:45
@MarkK Yes, I think it is because of OpenCV 3.1.0 (see [this answer](https://stackoverflow.com/a/54734716/9504155)). I am using OpenCV 4.1.1. Can you test if it works when you change the `[0]` after `findContours` to `[1]`? If it does, I will update my answer to explain the difference when using OpenCV 3. — Jonathan Feenstra, Dec 06 '19 at 08:33
i upgraded my OpenCV and now your code works perfectly. thank you for your sharing and help! — Mark K, Dec 06 '19 at 08:46
fantastic! your update is even more effective and universal applicable! thanks for the great sharing! — Mark K, Dec 15 '19 at 07:14

Remove colors from image to retain only texts

1 Answers1