Is it possible to set different learning rates for different variables in the same layer in TensorFlow?
For example, in a dense layer, how can you set a learning rate of 0.001 for the kernel while setting the learning rate for the bias to be 0.005?
One solution is to divide the layer into 2 layers. In one layer you only train the kernel (with a non-trainable 0 bias) and in the other one you only train the bias (with a non-trainable identity kernel). This way one can use tfa.optimizers.MultiOptimzer
to set different learning rates for the two layers. But this slightly slows down the training, because now the training of the bias and the kernel is not parallelised. So, I'm wondering if there is an standard way of setting different learning rates for different variables in the same layer in TF?