0

The negative log-likelihood for logistic regression is given by […] This is also called the cross-entropy error function.

— Page 246, Machine Learning: A Probabilistic Perspective, 2012

So I tried that and I found a bit of difference:

from sklearn.metrics import log_loss
y_true = [0, 0 , 0, 0]
y_pred = [0.5, 0.5, 0.5, 0.5]
log_loss(y_true, y_pred, labels=[0, 1]) # 0.6931471805599453

from math import log2
def cross_entropy(p, q):
    return -sum([p[i]*log2(q[i]) for i in range(len(p))])
cross_entropy(y_true, y_pred) #-0.0

Why?

desertnaut
  • 57,590
  • 26
  • 140
  • 166
Easy Points
  • 115
  • 6

1 Answers1

1

First, sklearn.metrics.log_loss applies natural logarithm (math.log or numpy.log) to probabilities, not base-2 logarithm.

Second, you obviously got -0.0 because of multiplying log probabilities to zeros in y_true. For a binary case, log-loss is

-logP(y_true, y_pred) = -(y_true*log(y_pred) + (1-y_true)*log(1-y_pred))

Third, you forgot to take an average of log-losses in your code.

from math import log

def bin_cross_entropy(p, q):
    n = len(p)
    return -sum(p[i]*log(q[i]) + (1-p[i])*log(1-q[i]) for i in range(n)) / n

bin_cross_entropy(y_true, y_pred)  # 0.6931471805599453
Sanjar Adilov
  • 1,039
  • 7
  • 15