0

This is the code and I guess the error is coming from the padding and truncation part.

from datasets import load_dataset, Dataset

dataset = load_dataset("go_emotions")

train_text = dataset['train']['text']
test_text = dataset['test']['text']
val_text = dataset['validation']['text']
train_labels = dataset['train']['labels']
test_labels = dataset['test']['labels']
val_labels = dataset['validation']['labels']

from transformers import DistilBertTokenizerFast
tokenizer = DistilBertTokenizerFast.from_pretrained('distilbert-base-uncased')

train_encodings = tokenizer(train_text, padding='max_length')
val_encodings = tokenizer(val_text,  padding='max_length')
test_encodings = tokenizer(test_text,  padding='max_length')

import tensorflow as tf

train_dataset = tf.data.Dataset.from_tensor_slices((
    dict(train_encodings),
    train_labels
))
val_dataset = tf.data.Dataset.from_tensor_slices((
    dict(val_encodings),
    val_labels
))
test_dataset = tf.data.Dataset.from_tensor_slices((
    dict(test_encodings),
    test_labels
))

The error that Im getting is this

ValueError: Can't convert non-rectangular Python sequence to Tensor.

I tried playing around with the padding and truncation params but to no avail!

0 Answers0