Huggingface transformers padding vs pad_to_max_length

Question

I'm running a code by using pad_to_max_length = True and everything works fine. Only I get a warning as follow:

FutureWarning: The pad_to_max_length argument is deprecated and will be removed in a future version, use padding=True or padding='longest' to pad to the longest sequence in the batch, or use padding='max_length' to pad to a max length. In this case, you can give a specific length with max_length (e.g. max_length=45) or leave max_length to None to pad to the maximal input size of the model (e.g. 512 for Bert).

But when I change pad_to_max_length = True to padding='max_length' I get this error:

RuntimeError: stack expects each tensor to be equal size, but got [60] at entry 0 and [64] at entry 6

How can I change the code to the new version? Is there anything I got wrong with the warning documentation?

This is my encoder:

encoding = self.tokenizer.encode_plus(
    poem,
    add_special_tokens=True,
    max_length= 60,
    return_token_type_ids=False,
    pad_to_max_length = True,
    return_attention_mask=True,
    return_tensors='pt',
)

Peyman · Accepted Answer · 2022-06-23T08:48:26.483

0

It seems that the documentation is not complete enough!

You should add truncation=True too to memic the pad_to_max_length = True.

like this:

encoding = self.tokenizer.encode_plus(
    poem,
    add_special_tokens=True,
    max_length=self.max_len,
    return_token_type_ids=False,
    padding='max_length',
    truncation=True,
    return_attention_mask=True,
    return_tensors='pt',
)

edited Jun 23 '22 at 08:48

answered Jun 23 '22 at 05:09

Peyman

3,097
5
33
56

Huggingface transformers padding vs pad_to_max_length

1 Answers1