0

I use SSD network to detect objects in thermal image. Since our camera has only the resolution of 160x120 pixels, I don't resize image to 300x300 but use the camera resolution as the input into the SSD network. The network is prototxt trained using Caffe-SSD framework.

SSD network uses prior boxes with the sizes that are scaled to the image size. The original code to compute these sizes can be found here: https://github.com/weiliu89/caffe/blob/ssd/examples/ssd/ssd_pascal.py#L308

When I feed the network with 160x120 images but leave the priorboxes scaled to original resolution of 300x300, it seems to work without any problems (so the smaller resolution is not problem in my case). But when I scale the priorboxes to input resolution of 160x120, a lot of detections are missing. The same problem appears for SqueezeNet-SSD, MobileNet-SSD and ResNet-SSD.

I don't understand why - maybe because the smaller priorboxes do not fit my dataset? I'm thinking if it would be possible to optimize priorboxes size and aspect ratios to my dataset. I'm trying to implement it using K-means algorithm but I am not sure if I'm doing it correctly.

According to the paper "An Optimized SSD Target Detection Algorithm Based on K-Means Clustering", I call kmeans algorithm (with k=number of expected priorboxes) on my dataset with custom metrics "d(box,centroid)=1−IOU(box,centroid)" (that's why pyclustering is used for the first kmeans). The found clusters are subclustered with KMeans to get the best aspect ratios for each priorbox. The first coordinate of each center is used as the min_size of each priorbox. The parameter max_size should be set to the min_size of the next priorbox. But I am not very wise about the numbers that come out of it.

Here is my code:

import os
import numpy as np
import xml.etree.ElementTree as ET
import random

from matplotlib import pylab as plt
from sklearn.cluster import KMeans
from pprint import pprint

from pyclustering.cluster.kmeans import kmeans
from pyclustering.cluster.center_initializer import random_center_initializer
from pyclustering.utils.metric import distance_metric, type_metric

def xml_to_boxes(path, rescale_width=None, rescale_height=None):
  xml_list = []
  filenames = os.listdir(os.path.join(path))
  filenames = [os.path.join(path, f) for f in filenames if (f.endswith('.xml'))]
  for xml_file in filenames:
    tree = ET.parse(xml_file)
    root = tree.getroot()
    for member in root.findall('object'):
      bndbox = member.find('bndbox')
      bbox_width = int(bndbox.find('xmax').text) - int(bndbox.find('xmin').text)
      bbox_height = int(bndbox.find('ymax').text) - int(bndbox.find('ymin').text)
      if rescale_width and rescale_height:
        size = root.find('size')
        bbox_width = bbox_width * (rescale_width / int(size.find('width').text))
        bbox_height = bbox_height * (rescale_height / int(size.find('height').text))
      xml_list.append([bbox_width, bbox_height])
  bboxes = np.array(xml_list)
  return bboxes
  
def iou_distance(box1, box2):
    w1,h1 = box1
    w2,h2 = box2
    interArea = max(0, min(w1,w2)) * max(0, min(h1,h2))
    boxAArea = w1 * h1
    boxBArea = w2 * h2
    iou = interArea / float(boxAArea + boxBArea - interArea)
    return 1 - iou  

dataset = xml_to_boxes("./annotations", rescale_width=160, rescale_height=120)
print("Size: ", len(dataset))

X = [x[0] for x in dataset]
Y = [y[1] for y in dataset]

n_clusters=4
aspect_ratio_count = [2, 3, 3, 3]

metric=distance_metric(type_metric.USER_DEFINED, func=iou_distance)
initial_centers = random_center_initializer(dataset, n_clusters, random_state=0).initialize()
km = kmeans(dataset, initial_centers=initial_centers, max_iter=2000, metric=metric)
km.process()
 
centers = np.array(km.get_centers())
clusters = km.get_clusters()
sorted_clusters = sorted(clusters, key=lambda x: np.mean([dataset[i][0] for i in x]))
                
colors = np.random.rand(len(clusters), 3)
color_array = np.zeros((len(X), 3))
for i in range(len(clusters)):
    color_array[clusters[i], :] = np.tile(np.array(colors[i]), (len(clusters[i]), 1))

plt.scatter(X, Y, color=color_array)
plt.scatter(centers[:,0], centers[:,1], marker='x', color='black')

sorted_centers = centers[np.argsort(centers[:, 0])]
pprint(sorted_centers)                                

prior_boxes = []
for i in range(len(sorted_centers)):
    subcluster_data = [dataset[j] for j in sorted_clusters[i]]
    sub_kmeans = KMeans(init='random', n_clusters=aspect_ratio_count[i], random_state=0).fit(subcluster_data)
    sub_centers = sub_kmeans.cluster_centers_
    
    aspect_ratios = [round(w/h, 1) for w,h in sub_centers]
    prior_boxes.append({
        'min_size': round(sorted_centers[i][0], 1),
        'aspect_ratio': aspect_ratios
    })
    
    
pprint(prior_boxes)    
plt.show(block=True)

The output are 4 priorboxes with values that I should fill in the prototxt file for each PriorBox layer.

[{'aspect_ratio': [1.1, 1.0], 'min_size': 12.2},
 {'aspect_ratio': [1.2, 0.7, 1.5], 'min_size': 36.5},
 {'aspect_ratio': [4.9, 5.0, 4.7], 'min_size': 37.9},
 {'aspect_ratio': [1.3, 0.7, 0.7], 'min_size': 61.3}]

But are these values really correct? Should there be so large step (61->160) for the last priorbox? Any other ideas?

Here is the plot of all boxes separated into clusters with marked centers. K-means clusters

bigmuscle
  • 419
  • 1
  • 6
  • 16

0 Answers0