1

I'm following this tutorial https://www.youtube.com/watch?v=V1-Hm2rNkik&list=LL&index=2 to finetune. The only difference is that i'm using the GPT2Tokenizer and GPT2LMHeadModel instead of BERT.

When i get to the training part (11:53) i get the following error:

ValueError: `Checkpoint` was expecting model to be a trackable object (an object derived from `Trackable`), got GPT2LMHeadModel(
  (transformer): GPT2Model(
    (wte): Embedding(50257, 768)
    (wpe): Embedding(1024, 768)
    (drop): Dropout(p=0.1, inplace=False)
    (h): ModuleList(
      (0-11): 12 x GPT2Block(
        (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (attn): GPT2Attention(
          (c_attn): Conv1D()
          (c_proj): Conv1D()
          (attn_dropout): Dropout(p=0.1, inplace=False)
          (resid_dropout): Dropout(p=0.1, inplace=False)
        )
        (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (mlp): GPT2MLP(
          (c_fc): Conv1D()
          (c_proj): Conv1D()
          (act): NewGELUActivation()
          (dropout): Dropout(p=0.1, inplace=False)
        )
      )
    )
    (ln_f): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
  )
  (lm_head): Linear(in_features=768, out_features=50257, bias=False)
). If you believe this object should be trackable (i.e. it is part of the TensorFlow Python API and manages state), please open an issue.

I'm not sure how to fix this? This is the training code i used from the video:

training_args = TFTrainingArguments(
    output_dir='./results',          
    num_train_epochs=2,          
    per_device_train_batch_size=8, 
    per_device_eval_batch_size=16
    warmup_steps=500,               
    weight_decay=0.01,             
    logging_dir='./logs',           
    logging_steps=10,
)


with training_args.strategy.scope():
#     model = AutoModelForCausalLM.from_pretrained("gpt2")
    model = GPT2LMHeadModel.from_pretrained("gpt2")

trainer = TFTrainer(
    model=model,                       
    args=training_args,                 
    train_dataset=train_dataset,       
    eval_dataset=test_dataset       
)

trainer.train()
cronoik
  • 15,434
  • 3
  • 40
  • 78
jam
  • 21
  • 3
  • You are loading a pytorch model but use the tensorflow trainer. Use [TFGPT2LMHeadModel](https://huggingface.co/docs/transformers/model_doc/gpt2#transformers.TFGPT2LMHeadModel) instead. – cronoik Jun 05 '23 at 06:32
  • @cronoik TFGPT2LMHeadModel got rid of the error but it introduced a new one: "TypeError: Expected string passed to parameter 'y' of op 'NotEqual', got -100 of type 'int' instead. Error: Expected string, but got -100 of type 'int'" do you know about this? – jam Jun 06 '23 at 18:45

0 Answers0