1

Given a transformer model on huggingface, how do I find the maximum input sequence length?

For example, here I want to truncate to the max_length of the model: tokenizer(examples["text"], padding="max_length", truncation=True) How do I find the value of "max_length"?

I need to know because I am trying to solve this error "Asking to pad to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no padding."

JobHunter69
  • 1,706
  • 5
  • 25
  • 49

1 Answers1

0

For the facebook/opt-125m model specifically, the maximum input sequence length can be found in its configuration.

from transformers import AutoConfig

model_name = "facebook/opt-125m"
config = AutoConfig.from_pretrained(model_name)

max_length = config.max_position_embeddings
print("Maximum input sequence length:", max_length)

Phoenix
  • 1,343
  • 8
  • 10
  • I tried it in my case: `tokenizer = AutoTokenizer.from_pretrained("facebook/opt-125m", use_fast=False) max_length = tokenizer.model_max_length print("Maximum input sequence length:", max_length)` But it outputs nonsense: 1000000000000000019884624838656 – JobHunter69 Jun 24 '23 at 20:57
  • @JobHunter69 You can try this code now – Phoenix Jun 24 '23 at 21:05