2

I use the following code to load the saved model:

 config = T5Config.from_pretrained(
        model_name_or_path,
        cache_dir=model_args.cache_dir,
        revision=model_args.model_revision,
        use_auth_token=True if model_args.use_auth_token else None,
    )
    config.train_task_adapters = adapter_args.train_task_adapters
 # Set tokenizer
    tokenizer = AutoTokenizer.from_pretrained(
        model_name_or_path,
        cache_dir=model_args.cache_dir,
        use_fast=model_args.use_fast_tokenizer,
        revision=model_args.model_revision,
        use_auth_token=True if model_args.use_auth_token else None,
    )

    # Initialize the model
    model = T5ForConditionalGeneration.from_pretrained(
        model_name_or_path,
        from_tf=bool(".ckpt" in model_name_or_path),
        config=config,
        cache_dir=model_args.cache_dir,
        revision=model_args.model_revision,
        use_auth_token=True if model_args.use_auth_token else None,
        adapter_config=adapter_config
    )

However I recieive the following error:

RuntimeError: Error(s) in loading state_dict for T5ForConditionalGeneration:
    size mismatch for encoder.model_embeddings.weight: copying a param with shape torch.Size([32128, 768]) from checkpoint, the shape in current model is torch.Size([32138, 768]).
    size mismatch for decoder.model_embeddings.weight: copying a param with shape torch.Size([32128, 768]) from checkpoint, the shape in current model is torch.Size([32138, 768]).
exit 1
Ahmad
  • 8,811
  • 11
  • 76
  • 141

1 Answers1

0

Usually you don't encounter any problems when loading the model for which you've added some extra tokens during the training. In my case, it was the pad_to_multiple_of parameter that caused the trouble. It is claimed to do some Nvidia magic for a more efficient utilization of modern GPUs, so I used it when I created the model for training and then happily forgot about it:

model.resize_token_embeddings(len(tokenizer), pad_to_multiple_of=16)

But as it seems, the current API (4.33.0.dev0) struggles to load such models. The workaround would be:

MODEL_CHECKPOINT = ''  # your directory here
config_path = path.join(MODEL_CHECKPOINT, 'config.json')
weights_path = path.join(MODEL_CHECKPOINT, 'pytorch_model.bin')

tokenizer = GPT2Tokenizer.from_pretrained(MODEL_CHECKPOINT)

config = AutoConfig.from_pretrained(config_path)
model = T5ForConditionalGeneration(config)  
model.resize_token_embeddings(len(tokenizer), pad_to_multiple_of=16)
model.load_state_dict(torch.load(weights_path, map_location=torch.device('cpu')))

Which outputs: <All keys matched successfully>

Nick S
  • 21
  • 3