9

I have a Keras model (Sequential) in Python 3:

class LossHistory(keras.callbacks.Callback):
    def on_train_begin(self, logs={}):
        self.matthews_correlation = []

    def on_epoch_end(self, batch, logs={}):
        self.matthews_correlation.append(logs.get('matthews_correlation'))
...    
model.compile(loss='mean_squared_error', optimizer='adam', metrics=['matthews_correlation'])
history = LossHistory()
model.fit(Xtrain, Ytrain, nb_epoch=10, batch_size=10, callbacks=[history])
scores = model.evaluate(Xtest, Ytest, verbose=1)

...
MCC = matthews_correlation(Ytest, predictions)

The model.fit() prints out - supposedly according to metrics = ['matthews_correlation'] part - progress and a Matthews Correlation Coefficient (MCC). But they are rather different from what MCC in the end gives back. The MCC function in the end gives the overall MCC of the prediction and is consistent with the MCC function of sklearn (i.e. I trust the value).

1) What are the scores from model.evaluate()? They are totally different from the MCC in the end or the MCCs of the epochs.

2) What are the MCCs from the epochs? It looks like this:

Epoch 1/10 580/580 [===========] - 0s - loss: 0.2500 - matthews_correlation: -0.5817

How are they calculated and why do they differ so much from the MCC in the very end?

3) Can I somehow add the function matthews_correlation() to the function on_epoch_train()? Then I could print out the MCC independently calculated. I don't know what Keras implicitly does.

Thanks for your help.

Edit: Here is an example how they record a history of loss. If I print(history.matthews_correlation), I get a list of the same MCCs that the progress report gives me.

ste
  • 448
  • 7
  • 14

1 Answers1

9

The reason your MCC is negative might be due to a bug recently fixed in Keras implementation. Check this issue.

The solution to your problem could be to reinstall Keras from GitHub master branch or to write your own callback (as described here) as fixed in the issue:

import keras.backend as K
def matthews_correlation(y_true, y_pred):
    y_pred_pos = K.round(K.clip(y_pred, 0, 1))
    y_pred_neg = 1 - y_pred_pos

    y_pos = K.round(K.clip(y_true, 0, 1))
    y_neg = 1 - y_pos

    tp = K.sum(y_pos * y_pred_pos)
    tn = K.sum(y_neg * y_pred_neg)

    fp = K.sum(y_neg * y_pred_pos)
    fn = K.sum(y_pos * y_pred_neg)

    numerator = (tp * tn - fp * fn)
    denominator = K.sqrt((tp + fp) * (tp + fn) * (tn + fp) * (tn + fn))

    return numerator / (denominator + K.epsilon())
ledawg
  • 2,395
  • 1
  • 11
  • 17
Matt07
  • 504
  • 7
  • 21
  • That explains the difference between the scikit learn MCC and the Keras MCC, thanks for drawing my attention to the new version of Keras. – ste Oct 28 '16 at 11:44
  • If I use this implementation, I get the error: '''ValueError: An operation has `None` for gradient. Please make sure that all of your ops have a gradient defined (i.e. are differentiable). Common ops without gradient: K.argmax, K.round, K.eval.''' – tag Jan 07 '19 at 15:38
  • @tag this code was for Keras 1.2. The latest release is 2.2, so the underlying interfaces might be changed. Please refer to latest docs for the definition of custom metrics https://keras.io/metrics/ – Matt07 Feb 01 '19 at 11:42
  • 2
    I can verify that this works well on keras with tensorflow 2.0 api. – YOLO Jan 10 '20 at 06:27
  • 1
    The MCC code assumed binary classification with single-column output. Thus, the resulting MCC would be totally wrong if you have a binary classification model with two-column output. Please note that MCC is a correlation coefficient and thus, its value is between -1 and 1. – Tae-Sung Shin Oct 20 '20 at 23:23
  • Correct, it assumes a single-column output, i.e., a binary classification task. Thanks for pointing that out. About the MCC being into [-1,1], that is correct as well, even though 0 would mean no correlation between labels and predictions (random predictions) while -1 would mean anti-correlation. So it is very unlikely to go below zero during training in practice (your model should be worse than random guess classifier), especially after a few epochs of training. If it happens there is probably a bug somewhere. – Matt07 Oct 21 '20 at 08:38