I have a Python script to create a dataset with images and matching labels for YOLOv8 model training. I have an issue with the bounding boxes. The bounding boxes that have been created seems to be right in the val_batch0_labels.jpg image (the boxes are nicely around the cards) -
However, in the labels.jpg image, the shape of the bounding boxes is incorrect and effects the training and predictions on unseen images. The cards are much more rectangular, not square (especially in the vertical cards) Here is the labels.jpg image (see the top right of the image):
Here is an example label .txt file of a vertical card -
'0 0.564145 0.623561 0.379386 0.394737'
Here is an example label .txt file of a horizontal card -
'1 0.396107 0.456620 0.526316 0.284539'
The last two numbers are the width and height of the bounding box, but is currently incorrect in the labels.jpg image.
Here is the code, the 'w_norm' and 'h_norm' are calculating the width and height of the bounding box.
for card_type in ['v', 'h']:
card_files = [f for f in os.listdir(cards_dir) if f.startswith(card_type)]
for card_file in card_files:
for i in range(num_images_per_card):
# Open raw card image and randomly rotate it
card_path = os.path.join(cards_dir, card_file)
card = Image.open(card_path).convert("RGBA")
angle = random.randint(0, 0)
rotated_card = card.rotate(angle, expand=True)
# Compute position of rotated card on white background
x_range = image_size[0] - (2 * margin) - rotated_card.width
y_range = image_size[1] - (2 * margin) - rotated_card.height
x_pos = random.randint(margin, margin + x_range)
y_pos = random.randint(margin, margin + y_range)
# Create white background image and paste rotated card onto it
background = Image.new("RGB", image_size, (255, 255, 255))
background.paste(rotated_card, (x_pos, y_pos), rotated_card)
# Compute bounding box width and height
w, h = rotated_card.size
# Compute and save label file
label_path = os.path.join(labels_dir, f"{card_file[:-4]}-pos-{i + 1}.txt")
x1, y1 = x_pos, y_pos
x2, y2 = x_pos + w, y_pos + h
x1_norm = (x1 + (w / 2)) / image_size[0]
y1_norm = (y1 + (h / 2)) / image_size[1]
w_norm = (x2 - x1) / image_size[0]
h_norm = (y2 - y1) / image_size[1]
card_type_code = "0" if card_type == "v" else "1"
label = f"{card_type_code} {x1_norm:.6f} {y1_norm:.6f} {w_norm:.6f} {h_norm:.6f}\n"
with open(label_path, "w") as f:
f.write(label)
I can't seem to see where I am going wrong, any help is greatly appreciated! Thank you very much!
I have tried calculating the w and h in many ways, but the output is always incorrect.
I need a w and h between 0 and 1 which match the cards bounding box.