3

I'm looking to constrain one layer of my neural network to specifically find the best rotation of its input in order to satisfy an objective. (My end goal, where R is the rotation layer, is of the form R.transpose() @ f(R @ z)).

I am looking to train this (+ other components) via gradient descent. If z is just two dimensional, then I can just say

R = [ cos(theta)   -sin(theta)
      sin(theta)    cos(theta)]

and have theta be a learnable parameter. However, I am lost on how to actually set this up for a d-dimensional space (where d>10). I've tried looking at resources on how to make a d-dimensional rotation matrix and it gets heavy into Linear Algebra and is way over my head. It feels like this should be easier than it seems, so I feel like I'm overlooking something (like maybe R should just be a usual linear layer without any non-linear activations).

Anyone have any ideas? I appreciate you, in advance : )

Sean K
  • 102
  • 6
  • a cross-site post: https://stats.stackexchange.com/q/546220/144441 – OmG Sep 27 '21 at 23:44
  • Sorry yeah I wasn't sure if this would fit into StackOverflow or cross-validated so I posted both places. I believe the cross-validated answer is good, so should I copy it here (with credit), or should I just delete this post? – Sean K Sep 28 '21 at 15:32

1 Answers1

1

QR decomposition can help with this (since Q is orthogonal) via having W be an unconstrained learnable matrix (without a bias term) and solve W = QR, and then actually use Q as your orthonormal. If you use the pytorch QR then backprop will be able to get back from the QR decomposition and update W.

Sean K
  • 102
  • 6