How Can I Convert Dataset Annaotations To Fixed(YoloV5) Format Without Hand Encoding

Question

So I Am Working On This Awesome Project On Object Detection,Where The Prior Task Is To Identify Brand Logos, So after Doing some research i found this dateset available for the brand logo For More About Dataset:here

DATASET:

This dateset has 2 versions

FlickrLogos32

FlickrLogos47(recommended for brand detection)

as the name 32 and 47 are the no. of classes offered by this dataset. From the Documentation itself mentioned 47 version is correctly annotated and recommended for object detection & recognization also in my project i have used 47 version

Model:

I Am Using YoloV5 For object detection the reason behind using YoloV5 and not previous versions is, it it well documented with couple of tutorials with jupyter notebooks available

Problem:

As For The YoloV5:Object Detection Model,The Object Label Should Be Annotated As
<x_center> <y_center> <width> <height> corresponds to bounding box(below image),
whereas the dataset annotations are given in the form of
<x1> <y1> <x2> <y2>
where <x1>,<y1>:upper left corner of the bounding box
<x2>,<y2>:lower right corner of the bounding box.

How can i transform <x1>,<y1>,<x2>,<y2>: corner points of bounding box to naive yolo annotations format i.e.<center_x>,<center_y>,<height>,<width> without manually going one by one over image and drawing rectangle box with roboflow
Also the Labels are annotated by pixel so we have to normalize it in (0,1)

Datset Insights:

For Any Dataset Example Its Having An Image(.png) and as a Label A Ground truth(.txt)(see below image)

the '.mask' file its just binary mask of object present in image

So A Data Example look likes:
Image:

gt_data.txt:

Mask:

From My Approach The Height And Width Could Be Calculated As Following,(x2-x1)will give the width and (y2-y1) will give the height but what about the center_x And center_y?Does Making them from half of above result like(x2-x1)/2 or(y2-y1)/2 gives actual centers? pls let me know — TeraCotta, Oct 26 '21 at 20:33

score 1 · Answer 1 · answered Oct 27 '21 at 01:58

1

In general to calculate the center it should be xmin + (width/2) and ymin + (height/2). So I think you have you /2 in wrong part of the equation.

Also note that an yolo annotation will look like this.

0.642859 0.079219 0.148063 0.148062

The coordinates are relative to the size of the photo from 0-1. To normalize the coordinates you need to normalize the x dimensions by dividing by the photo width and normalize the y dimensions by dividing by the photo height.

answered Oct 27 '21 at 01:58

alexheat

479
5
9

i just got dataset annotations as`` i don't know the `height & width` i am calculating is correct or not. also how to figure out `x_min` & `y_min` you just mentioned above and the normalizing procedure.Could you elaborate more so i get more intution answer – TeraCotta Oct 27 '21 at 05:11
1

Per the info you provided above ,:upper left corner of the bounding box so x1 is xmin and y1 is ymin x2 is xmax and y2 is ymax In order to convert something to yolo format you must know the height and width of the image. If you have the image it is possible to get the height and width of it using the cv2 library. I have a sample of this kind of code here https://github.com/pylabel-project/pylabel/blob/main/pylabel/importer.py And here is a snip im = cv2.imread(str(image_path)) img_height, img_width, img_depth = im.shape – alexheat Oct 28 '21 at 03:41
1

I am building a Python library called PyLabel to help people like you with these labeling transformation tasks. See https://github.com/pylabel-project/pylabel. If you contact me I will try to add support for this annotation format. You should be able to find my contact info from the github page. – alexheat Oct 28 '21 at 03:53

How Can I Convert Dataset Annaotations To Fixed(YoloV5) Format Without Hand Encoding

1 Answers1