I am trying to compute Object Keypoint similarity to evaluate the keypoint detection of an algorithm. below is code I've written based on what I've found and understood from here
def oks(gt, preds, threshold, v, gt_area):
ious = np.zeros((len(preds), len(gt)))
sigmas = np.array([.26, .25, .25, .35, .35, .79, .79, .72, .72, .62,.62, 1.07, 1.07, .87, .87, .89, .89])/10.0
vars = (sigmas*2)**2
k = len(sigmas)
xg = gt[:, 0]; yg = gt[:, 1]
xp = preds[:, 0]; yp = preds[:, 1]
vg = v + 1 # add one to visibility tags
k1 = np.count_nonzero(vg > 0)
dx = np.subtract(xg, xp)
dy = np.subtract(yg, yp)
e = (dx**2+dy**2)/vars/(gt_area+np.spacing(1))/2
if threshold is not None:
ind = list(vg > threshold)
e = e[ind]
ious = np.sum(np.exp(-e))/(1.5*e.shape[0]) if len(e) != 0 else 0
return ious
where,
gt, preds
are 17x2 NumPy arrays containing 17 (x, y) coordinates of human pose for ground truth and prediction from the machine learning model respectively.
threshold
= 0.5(coco dataset uses 0.5 as a soft threshold),
v
= visibility of ground truth keypoints(17x1 NumPy array) with values 0 = visible and 1 = occluded( thus we do vg=v+1 to comply with oks formula)
gt_area
= area of the bounding box for ground truth person.
I was under the impression that oks is supposed to yield a value for each of the keypoints but the above code leads to a single value for all the keypoints combined. Am i doing something wrong here?