-1

I generated a data-set of (200 x 200x 3) images in which each image contains a 40 X 40 box of different color. Create a model using tensorflow which can predict coords of this 40 x 40 box. enter image description here

The code i used for generating these images:


from PIL import Image, ImageDraw
from random import randrange

colors = ["#ffd615", "#f9ff21", "#00d1ff", 
"#0e153a", "#fc5c9c", "#ac3f21",
"#40514e", "#492540", "#ff8a5c",
"#000000", "#a6fff2", "#f0f696",
"#d72323", "#dee1ec", "#fcb1b1"]

def genrate_image(color):
    img = Image.new(mode="RGB", size=(200, 200), color=color)
    return img

def save_image(img, imgname):
    img.save(imgname)

def draw_rect(image, color, x, y):
    draw = ImageDraw.Draw(image)
    coords = ((x, y), (x+40, y), (x+40, y+40), (x, y+40))
    draw.polygon(coords, fill=color)
    #return image, str(coords)
    return image, coords[0][0], coords[2][0], coords[0][1], coords[2][1]

FILE_NAME = "train_annotations.txt"

for i in range(0, 100):
    img = genrate_image(colors[randrange(0, len(colors))])
    img, x0, x1, y0, y1 = draw_rect(img, colors[randrange(0, len(colors))], randrange(200 - 50), randrange(200 - 50))
    save_image(img, "dataset/train_images/img"+str(i)+".png")
    with open(FILE_NAME, "a+") as f:
        f.write(f"{x0} {x1} {y0} {y1}\n")
        f.close()

can anyone help me by suggesting how can i build a model which can predict coords of a new image.

2 Answers2

0

Well the easiest way you can split these boxes is by doing a K-means clustering where K is 2. So you basically record all the rgb pixel values of the pixels. Then using K-means group up the pixels into 2 groups, one would be the background group, the other being the box color group. Then with the box color group, map those colors back to their original coordinates. Then get the mean of those coordinates to get the location of the 40x40 box.

https://www.tensorflow.org/api_docs/python/tf/compat/v1/estimator/experimental/KMeans Above is a source documentation on how to do K-means

RajaSJN
  • 11
  • 3
  • If you know what the color of the box and background is, then its even easier. You will use something called K-nearest Neighbour algorithm. In this algorithm you use 2 centres, 1 being the background color the other being the 40x40 box color. Then map each pixel over to the groups, and get the mean coordinate of the 40x40 box color to get locaiton of 40x40 box! – RajaSJN Dec 23 '22 at 15:31
  • Is there any way to solve this problem using CNNs. – Tushar Verma Dec 23 '22 at 15:33
0

It is enough to perform a bounding box regression, for this you just need to add a fully connected layer after СNN with 4 output values:x1,y1,x2,y2. where they are top left and bottom right. Something similar can be found here https://github.com/sabhatina/bounding-box-regression.

wvw321
  • 1
  • 1