I'm looking to constrain one layer of my neural network to specifically find the best rotation of its input in order to satisfy an objective. (My end goal, where R
is the rotation layer, is of the form R.transpose() @ f(R @ z)
).
I am looking to train this (+ other components) via gradient descent. If z is just two dimensional, then I can just say
R = [ cos(theta) -sin(theta)
sin(theta) cos(theta)]
and have theta
be a learnable parameter. However, I am lost on how to actually set this up for a d-dimensional space (where d>10). I've tried looking at resources on how to make a d-dimensional rotation matrix and it gets heavy into Linear Algebra and is way over my head. It feels like this should be easier than it seems, so I feel like I'm overlooking something (like maybe R should just be a usual linear layer without any non-linear activations).
Anyone have any ideas? I appreciate you, in advance : )