I'm following the multiple choice QA tutorial and trying to modify it slightly to fit my data. My data is exactly the same, except that I have 5 labels instead of 4:
# original data:
from datasets import load_dataset
swag = load_dataset("swag", "regular")
set(swag["train"]['label'])
>>> {0, 1, 2, 3}
# my data:
set(train_dataset["train"]['label'])
>>>
{0, 1, 2, 3, 4}
I'm running the code in the tutorial and getting the error:
nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [9,0,0] Assertion t >= 0 && t < n_classes failed.
I found from here and here that this can be caused when the target values are out of bounds, which can happend when using nn.CrossEntropyLoss
which expects a torch.LongTensor
with values in the range [0, nb_classes-1]
.
I will not copy the entire script from the tutorial since it's in the link above, but I found that the error can be replicated by modifying the DataCollatorForMultipleChoice
function by adding an extra label as follows:
from random import choices
@dataclass
class DataCollatorForMultipleChoice:
"""
Data collator that will dynamically pad the inputs for multiple choice received.
"""
tokenizer: PreTrainedTokenizerBase
padding: Union[bool, str, PaddingStrategy] = True
max_length: Optional[int] = None
pad_to_multiple_of: Optional[int] = None
def __call__(self, features):
label_name = "label" if "label" in features[0].keys() else "labels"
labels = [feature.pop(label_name) for feature in features]
labels = [random.choice(range(5)) for _ in range(16)] #<<<---ADDING EXTRA LABEL HERE. INSTEAD OF 0-4 THIS IS BETWEEN 0-5
print(len(labels))
print(labels)
batch_size = len(features)
num_choices = len(features[0]["input_ids"])
flattened_features = [
[{k: v[i] for k, v in feature.items()} for i in range(num_choices)] for feature in features
]
flattened_features = sum(flattened_features, [])
batch = self.tokenizer.pad(
flattened_features,
padding=self.padding,
max_length=self.max_length,
pad_to_multiple_of=self.pad_to_multiple_of,
return_tensors="pt",
)
batch = {k: v.view(batch_size, num_choices, -1) for k, v in batch.items()}
batch["labels"] = torch.tensor(labels, dtype=torch.int64)
return batch
Then when I run the trainer I get:
16 # batch
[0, 0, 2, 1, 1, 1, 0, 4, 0, 4, 3, 0, 0, 0, 1, 1] # labels
... nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [7,0,0] Assertion `t >= 0 && t < n_classes` failed.
I tried changing the number of labels in the model:
# original:
# model = AutoModelForMultipleChoice.from_pretrained("bert-base-uncased")
# my modification:
model = AutoModelForMultipleChoice.from_pretrained("bert-base-uncased", num_labels=5)
but I got the same error.
The script runs just fine with my data if I modify the added line from above to
labels = [random.choice(range(4)) for _ in range(16)] # note that now it's from 0-4 and not from 0-5