I am using CNTK's implementation of Fast R-CNN(released on github).
Selective Search was not giving me good Region Proposals, so I wrote something that better suited my data(I am dealing with scanned documents). My task is to identify WaterMarks in documents and bound a tight box around them. Extending CNTK's object detection tutorial to identify horizontally aligned WaterMarks was pretty straight forward, giving me decent accuracy. Although the network uses AlexNet's cov weights(transfer learning), it seems to generalize pretty well to images holding text. Now I am running into the issue of identifying rotated WaterMarks(rotated at some arbitrary degree).
I have a few questions on this problem:
The "out-of-the-box" Regression Head outputs
4 numbers -> (topX, topY, width, height)
;
However this representation does not allow for rotated rectangles. I understand that when creating my ground truth boxes, I must draw rotated rectangles as well as have rotated region proposals. In what way do I change the network architecture to predict boxes like this?5 numbers -> (topX, topY, width, height, angle)
: similar tocv2.minAreaRect()
function?8 numbers -> (x1, y1, x2, y2, x3, y3, x4, y4)
?
I apologize if this is a trivial problem, but I am having trouble wrapping my head around this.Does the algorithm even care that the object is rotated? Am I making it harder than it should be? I have read of others purposefully applying Image augmentations(changing scale and rotating) to have a more robust model. When this sort of augmentation is done is the model able to recognize and bound tight rotated rectangles/squares around the object of interest?