Assumption
You ask specifically about the 1D case, so we will solve the 1D case here, but the method is essentially the same for 2D.
Let us assume you have two ground truth bounding boxes: box 1 and box 2.
Further, let us assume that our model is not so great and predicts more than 2 boxes
(maybe it found something new, maybe it broke one box into two).
For this demonstration let us consider that this is what we are working with:
# labels
# box 1: x----y
# box 2: x++++y
# 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
# x--------y x+++++++++++++++++++++++++++++y TRUTH
# a-----------b PRED 1, BOX 1
# a+++++++++++++++++b PRED 2, BOX 2
# a++++++++++++++++++++++++++++++++b PRED 3, BOX 2
Core Problem
What you want is in effect a score on the alignment of your predictions to the targets.... but oh no! which targets
belong to which predictions?
Pick your distance function of choice and pair each prediction with a target based on that function.
In this case I will use a modified intersection over union (IOU) for the 1D case.
I chose this function as I wanted both PRED 2 and 3 from the above diagram to align to box 2.
With a score for each prediction, pair it with the target that produced the best score.
Now with a one-to-one prediction-target pair, calculate whatever it is that you want.
Demo with above assumption
from the above assumptions:
pred_boxes = [
[4, 8],
[6, 12],
[5, 16]
]
true_boxes = [
[4, 7],
[10, 20]
]
a 1d version of intersection over union:
def iou_1d(predicted_boundary, target_boundary):
'''Calculates the intersection over union (IOU) based on a span.
Notes:
boundaries are provided in the the form of [start, stop].
boundaries where start = stop are accepted
boundaries are assumed to be only in range [0, int < inf)
Args:
predicted_boundary (list): the [start, stop] of the predicted boundary
target_boundary (list): the ground truth [start, stop] for which to compare
Returns:
iou (float): the IOU bounded in [0, 1]
'''
p_lower, p_upper = predicted_boundary
t_lower, t_upper = target_boundary
# boundaries are in form [start, stop] and 0<= start <= stop
assert 0<= p_lower <= p_upper
assert 0<= t_lower <= t_upper
# no overlap, pred is too far left or pred is too far right
if p_upper < t_lower or p_lower > t_upper:
return 0
if predicted_boundary == target_boundary:
return 1
intersection_lower_bound = max(p_lower, t_lower)
intersection_upper_bound = min(p_upper, t_upper)
intersection = intersection_upper_bound - intersection_lower_bound
union = max(t_upper, p_upper) - min(t_lower, p_lower)
union = union if union != 0 else 1
return min(intersection / union, 1)
some simple helpers:
from math import sqrt
def euclidean(u, v):
return sqrt((u[0]-v[0])**2 + (u[1]-v[1])**2)
def mean(arr):
return sum(arr) / len(arr)
how we align our boundaries:
def align_1d(predicted_boundary, target_boundaries, alignment_scoring_fn=iou_1d, take=max):
'''Aligns predicted_bondary to the closest target_boundary based on the
alignment_scoring_fn
Args:
predicted_boundary (list): the predicted boundary in form of [start, stop]
target_boundaries (list): a list of all valid target boundaries each having
form [start, stop]
alignment_scoring_fn (function): a function taking two arguments each of
which is a list of two elements, the first assumed to be the predicted
boundary and the latter the target boundary. Should return a single number.
take (function): should either be min or max. Selects either the highest or
lower score according to the alignment_scoring_fn
Returns:
aligned_boundary (list): the aligned boundary in form [start, stop]
'''
scores = [
alignment_scoring_fn(predicted_boundary, target_boundary)
for target_boundary in target_boundaries
]
# boundary did not align to any boxes, use fallback scoring mechanism to break
# tie
if not any(scores):
scores = [
1 / euclidean(predicted_boundary, target_boundary)
for target_boundary in target_boundaries
]
aligned_index = scores.index(take(scores))
aligned = target_boundaries[aligned_index]
return aligned
how we calculate difference:
def diff(u, v):
return [u[0] - v[0], u[1] - v[1]]
combine it all into one:
def aligned_distance_1d(predicted_boundaries, target_boundaries, alignment_scoring_fn=iou_1d, take=max, distance_fn=diff, aggregate_fn=mean):
'''Returns the aggregated distance of predicted boundings boxes to their aligned bounding box based on alignment_scoring_fn and distance_fn
Args:
predicted_boundaries (list): a list of all valid target boundaries each
having form [start, stop]
target_boundaries (list): a list of all valid target boundaries each having
form [start, stop]
alignment_scoring_fn (function): a function taking two arguments each of
which is a list of two elements, the first assumed to be the predicted
boundary and the latter the target boundary. Should return a single number.
take (function): should either be min or max. Selects either the highest or
lower score according to the alignment_scoring_fn
distance_fn (function): a function taking two lists and should return a
single value.
aggregate_fn (function): a function taking a list of numbers (distances
calculated by distance_fn) and returns a single value (the aggregated
distance)
Returns:
aggregated_distnace (float): return the aggregated distance of the
aligned predicted_boundaries
aggregated_fn([distance_fn(pair) for pair in paired_boundaries(predicted_boundaries, target_boundaries)])
'''
paired = [
(predicted_boundary, align_1d(predicted_boundary, target_boundaries, alignment_scoring_fn))
for predicted_boundary in predicted_boundaries
]
distances = [distance_fn(*pair) for pair in paired]
aggregated = [aggregate_fn(error) for error in zip(*distances)]
return aggregated
run:
aligned_distance_1d(pred_boxes, true_boxes)
# [-3.0, -3.6666666666666665]
Note, for many predictions and many targets there are many ways to optimize the code. Here, I broke up the main functional chunks so it is clear what is going on.
Now does this make sense? Well since I wanted pred 2 and 3 to align to box 2, yes, both starts are prior the truth and both end prematurely.
Solution to question asked
copy pasted your examples:
# "detected" objects
p_obj = [
[[2, 3], [8, 8]], # class 1
[[4, 4], [6, 7]], # class 2
[[0, 0]] # class 3
]
# true objects
t_obj = [
[[1, 3], [6, 9]], # class 1
[[4, 7]], # class 2
[[0, 0]] # class 3
]
since you know the boxes per class this is easy:
[
aligned_distance_1d(p_obj[cls_no], t_obj[cls_no])
for cls_no in range(len(t_obj))
]
# [[1.5, -0.5], [1.0, -1.5], [0.0, 0.0]]
Does this output make sense?
Starting with a sanity check, let us look at class 3. The average distance of [start, stop] are both 0. Makes sense.
How about class 1? both predictions start too late (2 > 1, 8 > 6) but only one ends too soon (8 < 9). So makes sense.
Now let us look at class 2, which is why it seems you asked the question (more predictions than targets).
If we were to draw what the score suggests it would be:
# 0 1 2 3 4 5 6 7 8 9
# ---------- # truth [4, 7]
# ++ # pred [4 + 1, 7 - 1.5]
It doesn't look so great, but this is just an example...
Does this make sense? Yes / no. Yes in terms of how we calculated the metric. One stoped 3 values too soon the other started 2 too late.
No in the sense that neither of your predictions actually cover the value 5, and yet this metric leads you to believe that is the case...
Conclusion
Is this a faulty metric?
Depends on what you are using it for / trying to show.
However since you use a binary mask to generate you predicted boundaries, that is a non negligible root of this problem. Perhaps there is a better strategy to get boundaries from your label probabilities.