I'm following this tutorial https://www.youtube.com/watch?v=V1-Hm2rNkik&list=LL&index=2 to finetune. The only difference is that i'm using the GPT2Tokenizer and GPT2LMHeadModel instead of BERT.
When i get to the training part (11:53) i get the following error:
ValueError: `Checkpoint` was expecting model to be a trackable object (an object derived from `Trackable`), got GPT2LMHeadModel(
(transformer): GPT2Model(
(wte): Embedding(50257, 768)
(wpe): Embedding(1024, 768)
(drop): Dropout(p=0.1, inplace=False)
(h): ModuleList(
(0-11): 12 x GPT2Block(
(ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(attn): GPT2Attention(
(c_attn): Conv1D()
(c_proj): Conv1D()
(attn_dropout): Dropout(p=0.1, inplace=False)
(resid_dropout): Dropout(p=0.1, inplace=False)
)
(ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(mlp): GPT2MLP(
(c_fc): Conv1D()
(c_proj): Conv1D()
(act): NewGELUActivation()
(dropout): Dropout(p=0.1, inplace=False)
)
)
)
(ln_f): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
)
(lm_head): Linear(in_features=768, out_features=50257, bias=False)
). If you believe this object should be trackable (i.e. it is part of the TensorFlow Python API and manages state), please open an issue.
I'm not sure how to fix this? This is the training code i used from the video:
training_args = TFTrainingArguments(
output_dir='./results',
num_train_epochs=2,
per_device_train_batch_size=8,
per_device_eval_batch_size=16
warmup_steps=500,
weight_decay=0.01,
logging_dir='./logs',
logging_steps=10,
)
with training_args.strategy.scope():
# model = AutoModelForCausalLM.from_pretrained("gpt2")
model = GPT2LMHeadModel.from_pretrained("gpt2")
trainer = TFTrainer(
model=model,
args=training_args,
train_dataset=train_dataset,
eval_dataset=test_dataset
)
trainer.train()