I'm using a MLP with Keras, optimized with sgd. I want to tune the learning rate but it seems to have no effect whatsoever on training. I tried small learning rates (.01) as well as very large (up to 1e28), and the effects are barely notable. Shouldn't my loss explode when using a very large learning rate ?
I'm using a fully-connected NN with 3 hidden layers and sigmoid activation function. Loss is a variant of BinaryCrossEntropy. The goal is to predict credit default. Training set contains 500000 examples, with approx. 2% of defauts. Test set contains 200000 lines
def loss_custom_w(p):
def loss_custom(y,yhat):
y_l, y_lhat = keras.backend.flatten(y),keras.backend.flatten(yhat)
eps = keras.backend.epsilon()
y_lhat = keras.backend.clip(y_lhat, eps, 1-eps)
return - keras.backend.mean(p*y_l*keras.backend.log(y_lhat) + (1-y_l)*keras.backend.log(1-y_lhat))
return loss_custom
model = keras.Sequential([keras.layers.Dense(n_input), keras.layers.Dense(500, activation = 'sigmoid'), keras.layers.Dense(400, activation = 'sigmoid'), keras.layers.Dense(170, activation = 'sigmoid'), keras.layers.Dense(120, activation = 'sigmoid'), keras.layers.Dense(1, activation = 'sigmoid')])
sgd = keras.optimizers.SGD(lr = 1e20)
model.compile(optimizer = sgd, loss = loss_custom_w(8))
model.fit(x_train, y_train, epochs = 10, batch_size = 1000)
Update : - I’ve tried changing activation functions to avoid vanishing gradients, but it did not work.
the problem does not come from the loss function (I tried other losses too).
actually the networks seems to work well, as well as the custom loss. When I change the value of p, it does what it’s expected to do. I just can’t figure out why the learning rate has no effects. The classifier also gives satisfying results.
The network manages to predits labels from both classes. It predict better the 1 class when i use a large penalty value (as expected)