I am running a sentence transformer model and trying to truncate my tokens, but it doesn't appear to be working. My code is
from transformers import AutoModel, AutoTokenizer
model_name = "sentence-transformers/paraphrase-MiniLM-L6-v2"
model = AutoModel.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)
text_tokens = tokenizer(text, padding=True, truncation=True, return_tensors="pt")
text_embedding = model(**text_tokens)["pooler_output"]
I keep getting the following warning:
Token indices sequence length is longer than the specified maximum sequence length
for this model (909 > 512). Running this sequence through the model will result in
indexing errors
I am wondering why setting truncation=True
is not truncating my text to the desired length?