0

I'm working on a deep learning classifier (Keras and Python) that classifies time series into three categories. The loss function that I'm using is the standard categorical cross-entropy. In addition to this, I also have an attention map which is being learnt within the same model.

I would like this attention map to be as small as possible, so I'm using a regularizer. Here comes the problem: how do I set the right regularization parameter? What I want is the network to reach its maximum classification accuracy first, and then starts minimising the intensity attention map. For this reason, I train my model once without regulariser and a second time with the regulariser on. However, if the regulariser parameter (lambda) is too high, the network loses completely accuracy and only minimises the attention, while if the regulariser is too small, the network only cares about the classification error and won't minimise the attention, even when the accuracy is already the maximum.

Is there a smarter way to combine the categorical cross-entropy with the regulariser? Maybe something that considers the variation of categorical cross-entropy in time, and if it doesn't go down for, say N iterations, it only considers the regulariser?

Thank you

Ale152
  • 151
  • 2
  • 11
  • Just to clarify, when I say "regularizer" I refer to the definition of Keras documentation: "Regularizers allow to apply penalties on layer parameters or layer activity during optimization". I don't have any overfitting problem, I simply want to reduce the complexity of the attention map by penalising its intensity. – Ale152 Oct 01 '18 at 16:00

1 Answers1

1

Regularisation is a way to fight with overfitting. So, you should understood if your model overfits. A simple way to do it: you can compare f1 score for train and test. If f1 score for train is high and for test is low, seems, you have overfitting - so you need add some regularisation.

Danylo Baibak
  • 2,106
  • 1
  • 11
  • 18
  • Hi, thank you for your answer. I think I must have used the wrong term, because the Keras documentation refers to "regularizer" as something that "allows to apply penalties on layer parameters or layer activity during optimization". I don't really have any problem of overfitting, I simply want to reduce the complexity of my attention map by applying a penalty on its intensity. – Ale152 Oct 01 '18 at 15:58