0

I have a set of images similar to this one: enter image description here

And for each image, I have a text file with bounding box regions expressed in normalized pixel values, YOLOv5 format (a text document with rows of type: class, x_center, y_center, width, height). Here's an example:

3 0.1661542727623449 0.6696164480452673 0.2951388888888889 0.300925925925926
3 0.41214353459362196 0.851908114711934 0.2719907407407405 0.2961837705761321

I'd like to obtain a new dataset of masked images, where the bounding box area from the original image gets converted into white pixels, and the rest gets converted into black pixels. This would be and example of the output image: enter image description here
I'm sure there is a way to do this in PIL (Pillow) in Python, but I just can't seem to find a way. Would somebody be able to help? Kindest thanks!

Mark Setchell
  • 191,897
  • 31
  • 273
  • 432
Ema Ilic
  • 11
  • 2
  • It would help if you gave some more clues... `3` is maybe a rectangle? What if first digit is 2, or 1, 0, 4? What's a *"normalised"* pixel thing please? If you want folks to help you, make it easy for them... thank you. – Mark Setchell Aug 10 '22 at 17:41
  • Hi! They are all rectangles, in this case the class is not important and the class value should be ignored. Box coordinates are normalized xywh format (from 0 - 1). If your boxes are in pixels, divide x_center and width by image width, and y_center and height by image height. For example, pixel 50 on the x axis of an image of width 100 pixels has a normalized value of 50/100= 0.5. Either way, I figured out the solution and I will be posting it here later on, thanks for getting back to me though! – Ema Ilic Aug 11 '22 at 10:24

1 Answers1

0

so here's the answer:

import os
import numpy as np
from PIL import Image 


label=open(os.path.join(labPath, filename), 'r')
lines=label.read().split('\n')
square=np.zeros((1152,1152))
for line in lines:
    if line!='':
      line=line.split() #line: class, x, y, w, h
      left=int((float(line[1])-0.5*float(line[3]))*1152 )
      bot=int((float(line[2])+0.5*float(line[4]))*1152)
      top=int(bot-float(line[4])*1152)
      right=int(left+float(line[3])*1152)
      square[top:bot, left:right]=255
square_img = Image.fromarray(square)
square_img=square_img.convert("L")

Let me know if you have any questions!

Ema Ilic
  • 11
  • 2