1

I am learning Logistic Regression within Pytorch and to better understand I am defining a custom CrossEntropyLoss as below:

def softmax(x):
    exp_x = torch.exp(x)
    sum_x = torch.sum(exp_x, dim=1, keepdim=True)

    return exp_x/sum_x

def log_softmax(x):
    return torch.exp(x) - torch.sum(torch.exp(x), dim=1, keepdim=True)

def CrossEntropyLoss(outputs, targets):
    num_examples = targets.shape[0]
    batch_size = outputs.shape[0]
    outputs = log_softmax(outputs)
    outputs = outputs[range(batch_size), targets]

    return - torch.sum(outputs)/num_examples

I also make my own logistic regression (to predict FashionMNIST) as below:

input_dim = 784 # 28x28 FashionMNIST data
output_dim = 10

w_init = np.random.normal(scale=0.05, size=(input_dim,output_dim))
w_init = torch.tensor(w_init, requires_grad=True).float()
b = torch.zeros(output_dim)

def my_model(x):
    bs = x.shape[0]
    return x.reshape(bs, input_dim) @ w_init + b

To validate my custom crossentropyloss, I compared it with nn.CrossEntropyLoss from Pytorch by applying it on FashionMNIST data as below:

criterion = nn.CrossEntropyLoss()

for X, y in trn_fashion_dl:
    outputs = my_model(X)
    my_outputs = softmax(outputs)

    my_ce = CrossEntropyLoss(my_outputs, y)
    pytorch_ce = criterion(outputs, y)

    print (f'my custom cross entropy: {my_ce.item()}\npytorch cross entroopy: {pytorch_ce.item()}')
    break 

My question is toward the results my_ce (my cross entropy) vs pytorch_ce (pytorch cross entropy) where they are different:

my custom cross entropy: 9.956839561462402
pytorch cross entroopy: 2.378990888595581

I appreciate your help in advance!

zihaozhihao
  • 4,197
  • 2
  • 15
  • 25
A.E
  • 997
  • 1
  • 16
  • 33

2 Answers2

3

There are two bugs in your code.

  1. The log_softmax(x) should be like,
def log_softmax(x):
    return torch.log(softmax(x))
  1. When you calculate your own CE loss, you should input outputs instead of my_outputs. Because you will calculate softmax inside your own CE loss function. It should be like,
outputs = my_model(X)
my_ce = CrossEntropyLoss(outputs, y)
pytorch_ce = criterion(outputs, y)

Then you will have identical results.

my custom cross entropy: 3.584486961364746
pytorch cross entroopy: 3.584486961364746
zihaozhihao
  • 4,197
  • 2
  • 15
  • 25
2

It seems your log_softmax fn is wrong. It should simply be:

def log_softmax(x):
    return torch.log(softmax(x))

But since your softmax is not numerically stable, it may be unstable somewhat. You can improve it as below:

def log_softmax(x):
    return x - torch.logsumexp(x,dim=1)

Note that I have used the identity log (exp{x}/sum exp(x)) = x - log (sum exp(x))

Also see https://pytorch.org/docs/stable/torch.html?highlight=logsumexp#torch.logsumexp

Umang Gupta
  • 15,022
  • 6
  • 48
  • 66
  • thanks, after changing the log_softmax, the two cross entropy became closer but not exactly the same, is this expected? my custom cross entropy: 2.3021483421325684 pytorch cross entroopy: 2.4871463775634766 – A.E Oct 13 '19 at 14:30