I'm reading a paper about Fast-RCNN model.
In the paper section 2.3 part of 'SGD hyper-parameters', it said that All layers use a per-layer learning rate of 1 for weights and 2 for biases and a global learning rate of 0.001
Is 'per-layer learning rate' same as 'layer-specific learning rate' that give different learning rate by layers? If so, I can't understand how they('per-layer learning rate' and 'global learning rate') can be apply at the same time?
I found the example of 'layer-specific learning rate' in pytorch.
optim.SGD([
{'params': model.some_layers.parameters()},
{'params': model.some_layers.parameters(), 'lr': 1}
], lr=1e-3, momentum=0.9)
According to paper, Is this the correct approach?
Sorry for may English