0

I'm currently implementing the continuous bag-of-words (CBOW) model using PyTorch. I'm facing some problems when implementing the cross entropy loss, though. Here's the portion of code that's causing the problem:

for idx, sample in enumerate(self.train_data):
    x = torch.tensor(sample[0], dtype=torch.long)
    y = np.zeros(shape=(self.vocab_size)) # self.vocab_size = 85,000
    y[int(sample[1])] = np.float64(1)
    y = torch.tensor(y, dtype=torch.long)

    if torch.cuda.is_available():
        x = x.cuda()
        y = y.cuda()

    optimizer.zero_grad()

    output = self.model(x) # output's shape is the same as self.vocab_size
    loss = criterion(output, y)
    loss.backward()
    optimizer.step()

To briefly explain my code, the model that I've implemented basically outputs the averaged embedding values of a context array and performs a linear projection to project them into a shape that's identical to the size of the vocabulary. Then we run this array through a softmax function.

The contents of self.train_data are basically (context, target_word) pairs. y is a one-hot encoded array of the token.

I'm aware that the second input to nn.CrossEntropyLoss is C = # of classes, but I'm not sure where my code went wrong. The vocabulary size is 85,000 and so aren't the number of class 85,000?

If I change the input to

loss = criterion(output, 85000)

I get the same error:

*** RuntimeError: Dimension out of range (expected to be in range of [-1, 0], but got 1)

What am I doing wrong, and how should I understand the input to PyTorch's cross entropy loss?

Thanks.

Sean
  • 2,890
  • 8
  • 36
  • 78
  • 1
    The second input to `nn.CrossEntropyLoss` should be your class index (i.e. the class of the target variable), not the number of classes. That class index will be a scalar (one integer value), but for training using batches, it will be a tensor of size (batch size). – akshayk07 Nov 16 '19 at 16:30
  • Here is some example for the inputs pytorch-cross-entropy, even though the error is different - there is also an example of the correct input shape that may be helpful to you: https://stackoverflow.com/a/53458159/7483494 – MBT Nov 16 '19 at 18:29
  • The second input to `CrossEntropyLoss` is **not** the number of classes! The first argument needs to be a float tensor of **size** [batch size, # of classes] and the second argument needs to be an long/int64 tensor of **size** [batch size] with **values** between 0 and # classes **minus 1**. Check the dimensions of `x` and `y` and the values of `y` to make sure they are valid. – jodag Nov 16 '19 at 19:22

1 Answers1

0

I'm aware that the second input to nn.CrossEntropyLoss is C = # of classes, but I'm not sure where my code went wrong. The vocabulary size is 85,000 and so aren't the number of class 85,000?

The number of classes (nc) may be the 85000, but you also have the batch size:

target = torch.randint (nc, (bs,))

The target represents the true value, while output is what you get from the model for the particular input x in your case output = self.model(x).

In here

loss = criterion(output, target)

You can say the output is what you currently get from the model, and the target is what you should get when you finalize your training.

prosti
  • 42,291
  • 14
  • 186
  • 151