Keras: handling batch size dimension for custom pearson correlation metric

Question

I want to create a custom metric for pearson correlation as defined here

I'm not sure how exactly to apply it to batches of y_pred and y_true

What I did:

def pearson_correlation_f(y_true, y_pred):

    y_true,_ = tf.split(y_true[:,1:],2,axis=1)
    y_pred, _ = tf.split(y_pred[:,1:], 2, axis=1)

    fsp = y_pred - K.mean(y_pred,axis=-1,keepdims=True)
    fst = y_true - K.mean(y_true,axis=-1, keepdims=True)

    corr = K.mean((K.sum((fsp)*(fst),axis=-1))) / K.mean((
      K.sqrt(K.sum(K.square(y_pred - 
      K.mean(y_pred,axis=-1,keepdims=True)),axis=-1) * 
      K.sum(K.square(y_true - K.mean(y_true,axis=-1,keepdims=True)),axis=-1))))

return corr

Is it necessary for me to use keepdims and handle the batch dimension manually and the take the mean over it? Or does Keras somehow do this automatically?

It's a regression that predicts a vector of 10 values. So y_true, y_pred samples have dimension 10. — user3142067, Sep 08 '17 at 16:27
I'm not someone who's studied statistics enough... So, do you want to apply this formula to each of the 10 features separately (is that the purpose of `split`?) or is it ok to apply the formula to all of them at once? -- If the second option is ok, my answer should be ok. If not, I could try to correct it for targeting the correct axes. — Daniel Möller, Sep 08 '17 at 16:49
I actually asked that question on Cross Validated. No answer yet ... — user3142067, Sep 08 '17 at 18:22

Daniel Möller · Accepted Answer · 2017-09-11T13:39:57.180

4

When you use K.mean without an axis, Keras automatically calculates the mean for the entire batch.

And the backend already has standard deviation functions, so it might be cleaner (and perhaps faster) to use them.

If your true data is shaped like (BatchSize,1), I'd say keep_dims is unnecessary. Otherwise I'm not sure and it would be good to test the results.

(I don't understand why you use split, but it seems also unnecessary).

So, I'd try something like this:

fsp = y_pred - K.mean(y_pred) #being K.mean a scalar here, it will be automatically subtracted from all elements in y_pred
fst = y_true - K.mean(y_true)

devP = K.std(y_pred)
devT = K.std(y_true)

return K.mean(fsp*fst)/(devP*devT)

If it's relevant to have the loss for each feature instead of putting them all in the same group:

#original shapes: (batch, 10)

fsp = y_pred - K.mean(y_pred,axis=0) #you take the mean over the batch, keeping the features separate.   
fst = y_true - K.mean(y_true,axis=0) 
    #mean shape: (1,10)
    #fst shape keeps (batch,10)

devP = K.std(y_pred,axis=0)  
devt = K.std(y_true,axis=0)
    #dev shape: (1,10)

return K.sum(K.mean(fsp*fst,axis=0)/(devP*devT))
    #mean shape: (1,10), making all tensors in the expression be (1,10). 
    #sum is only necessary because we need a single loss value

Summing the result of the ten features or taking a mean of them is the same, being one 10 times the other (That is not very relevant to keras models, affecting only the learning rate, but many optimizers quickly find their way around this).

edited Sep 11 '17 at 13:39

answered Sep 08 '17 at 15:10

Daniel Möller

84,878
18
192
214

I am trying to do the same thing, but in my case I am predicting a single output. I get 'nan' when I use either of the methods above when I try to run model.evaluate(). Can you give me any hints on how to debug this? I can get custom metrics like this to work just fine: def my_mse(ytrue, y_pred): return K.mean(K.square(y_pred-y_true), axis =1). I tried putting axis=1 above and still got 'nan'. I also added check for zero denominator and return 0 if so to eliminate that possibility. Any help much appreciated. – Zak Keirn Jun 23 '18 at 17:51
If you have only one sample, very probably you don't have any deviation or statistics. – Daniel Möller Jun 23 '18 at 18:26
Thanks. I just checked the output of model.fit and I can see all the correlations being calculated and they look correct. So evidently the model.evaluate() only does one sample at a time as you suggest. – Zak Keirn Jun 23 '18 at 18:28
Usual reasons are invalid data or invalid results in layers. But there are many possibilities. Sometimes, for instance, you get a line full of zeros somewhere, or some impossible division / root, etc. – Daniel Möller Jun 23 '18 at 18:31
Based on description of Keras evaluate, it says it does it in default batches of 32 which should have worked? My test vector is much longer. So I am confused about why it seems to work in model.fit() but not in model.evaluate(). – Zak Keirn Jun 23 '18 at 18:41
For some reason, model.evaluate() produces 'nan' using the default batch_size of 32, if I set to 50 or 100 or even the entire length of the test data, it works fine. Thanks for your code, it works great. – Zak Keirn Jun 23 '18 at 19:15
You got a full zero batch, very probably.... and some operation does not support full zeros there. – Daniel Möller Jun 23 '18 at 19:25
It doesn't matter whether "fit" works, unless you say it's exactly the "same data". In which case, "fit" might still be shuffling it and creating valid batches by this. – Daniel Möller Jun 23 '18 at 19:26
Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/173687/discussion-between-zak-keirn-and-daniel-moller). – Zak Keirn Jun 23 '18 at 20:08

Keras: handling batch size dimension for custom pearson correlation metric

1 Answers1