I have a DataFrame with text I want to tokenize using the Hugging Face library. When running the code, the "Tokenized Text" column returns empty. How can this be solved? The code is as follows:
df = pd.read_csv('subject_messages.csv')
import torch
from transformers import AutoTokenizer, AutoModel
model_ckpt = "dccuchile/bert-base-spanish-wwm-uncased"
tokenizer = AutoTokenizer.from_pretrained(model_ckpt)
df["Tokenized_Text"] = tokenizer(df["Message"].to_list())
df.to_csv("tokenized_telegram_messages.csv", index=False)
I first thought I was not initializing the tokenizer correctly, but the model used is specifically trained for Spanish. The code should return a column with the tokenized text.