4

I am trying to build a CNN (in Keras) that can estimate the rotation of an image (or a 2d object). So basically, the input is an image and the output should be its rotation.

My first experiment is to estimate the rotation of MŃIST digits (starting with only one digit "class", let's say the "3"). So what I did was extracting all 3s from the MNIST set, and then building a "rotated 3s" dataset, by randomly rotating these images multiple times, and storing the rotated images together with their rotation angles as ground truth labels.

So my first problem was that a 2d rotation is cyclic and I didn't know how to model this behavior. Therefore, I encoded the angle as y=sin(ang), x = cos(ang). This gives me my dataset (the rotated 3s images) and the corresponding labels (x and y values).

For the CNN, as a start, i just took the keras MNIST CNN example (https://keras.io/examples/mnist_cnn/) and replaced the last dense layer (that had 10 outputs and a softmax activation) with a dense layer that has 2 outputs (x and y) and a tanh activation (since y=sin(ang), x = cos(ang) are within [-1,1]).

The last thing i had to decide was the loss function, where i basically want to have a distance measurement for angles. Therefore i thought "cosine_proximity" is the way to go.

When training the network I can see that the loss is decreasing and converging to a certain point. However when I then check the predictions vs the ground truth I observe a (for me) fairly surprising behavior. Almost all x and y predictions tend towards 0 or +/-1. And since the "decoding" of my rotation is ang=atan2(y,x) the predictions are usually either +/- 0°, 45°, 90, 135° or 180°. However, my training and test data has only angles of 0°, 20°, 40°, ... 360°. This doesn't really change if I change the complexity of the network. I also played around with the optimizer parameters without any success.

Is there anything wrong with the assumptions: - x,y encoding for angle - tanh activation to have values in [-1,1] - cosine_proximity as loss function

Thanks in advance for any advice, tips or pointing me towards a possible mistake i made!

thowol
  • 41
  • 3

1 Answers1

0

It's hard to give you an exact answer so let's try with some ideas:

  • Change from Cosine Proximity to MSE or other losses and check if something changes.
  • Change the way you encode the target. You could just represent the angle as a number between 0 and 1. It doesn't seem a problem even if the angles are ciclic.
  • Ensure you preprocessing/augmentation steps make sense for this particular task.
marco romelli
  • 1,143
  • 8
  • 19
  • Thanks for the comment! As for the suggestions: 1. I tried MSE with more or less the same result. 2. I'm not sure the [0, 1] encoding makes sense. Do you mean [0,360°] -> [0,1]? How would that work in terms of the loss? First off 0 and 1 would both be correct for a 0° rotation. Secondly, if we assume our target is 0° (0 encoded) and our current value is 324° (0.9 encoded). Therefore in the [0, 1] encoding the error is 324° (or 0.9 in the encoding). But in reality it is only 36° (0.1 in the encoding). You agree? 3. The only preproc. I do is rotating the images, so nothing fancy there. – thowol Jun 11 '19 at 11:53
  • Before diving deep in the loss analysis, you said your preprocessing includes rotations. Rotating the input image changes everything in your case; if you rotate the input by 90° you also have to do the same on the groundtruth. Can you confirm you're doing this correctly? – marco romelli Jun 13 '19 at 07:16
  • Of course. I take all the "3" images (unrotated) and annotate the with the labels x=cos(0°), y=sin(0°). Then, I rotate all images by 20° and annotate these images with x=cos(20°), y=sin(20°) and add them to the (unrotated) dataset. I do this for 40°, 60°, ... and so forth. That is the dataset (split into test/train) that I use... – thowol Jun 14 '19 at 08:16
  • It seems you are getting `atan2(1,-1)`, `atan2(1,0)`, `atan2(1,1)` and so on. Possibly the `tanh` activation is not able to stabilize at intermediate points. BTW you should post some code so that people have something concrete to work on. – marco romelli Jun 14 '19 at 08:36
  • Has somebody found a solution to this? How would you apply cyclic encoding/ decoding for learning rotation in images? – Sasha Sen Oct 22 '21 at 19:16