Here is the question: I am loading my pytorch model(best result on dev set while training) from the checkpoint file during model evaluation, remembering to do model.eval() and with torch.no_grad(), I still get a lower accuracy result(with 1-2% drop) on dev set compared with which I get while training.
I have tried:
- printing the state dict before pytorch save the best result model during training, compared with what I get while loading, which is the same.
- check my code, which use lots of dropout and layernorm layers, and get no error.
- load model on the same GPU but nothing helpful.
My working environment:
- Python 3.6.10, Pytorch 1.7.1(with cuda 11.1)
- GPU: NVIDIA 2080Ti
- use the same seed(numpy and pytorch) during training and evaluation
- use model.eval() and with torch.no_grad() on dev set during both model training and evaluating.
- the same dev set and the same metric calculation method.
Here is my pseudocode during training(the original one is too heavy):
# load my data.
train_dataset = FinetuningDataset(vocab=vocab, domains=domains, data_files=data_files, max_len=data_config["max_len"], giga_embedding_vocab=giga_embedding.word2id)
val_dataset = FinetuningDataset(vocab, domains=domains, data_files=dev_data_path, max_len=data_config['max_len'], giga_embedding_vocab=giga_embedding.word2id)
sp_collator = SortPadCollator(sort_key=lambda x:x[0], ignore_indics=[0])
train_iter = DataLoader(dataset=train_dataset,
batch_size=data_config["batch_size"],
shuffle=data_config["shuffle"],
collate_fn=sp_collator)
val_iter = DataLoader(dataset=val_dataset,
batch_size=data_config["batch_size"],
shuffle=data_config["shuffle"],
collate_fn=sp_collator)
adatrans = AdaTrans(vocab=vocab, config=model_config, domain_size=len(domains))
adatrans.load_state_dict(torch.load('ckpt_adatrans/litebert_1e-3_50cls_cuda2.pt'))
model = MixLM(adatrans=adatrans, vocab=vocab, config=model_config, giga_embedding=giga_embedding)
# this is my loss function during training.
loss_fn_dct = {"mask_loss": neg_log_likelihood_loss, "emb_mse_loss":nn.MSELoss(reduction='none'), "domain_cls_loss":nn.NLLLoss(reduction='none')}
metrics_fn_dct = {"mask_metrics":accuracy}
# build a trainer.
trainer = ftTrainer(loss_fn_dct=loss_fn_dct, metrics_fn_dct=metrics_fn_dct, config=trainer_config)
# gets best result on dev set and save it to checkpoint.pt
best_res, best_state_dict = trainer.train(model=model, train_iter=train_iter, val_iter=val_iter, optimizer=trainer_config['optimizer'], device=trainer_config['device'])
print("best result:: ", best_res)
trainer.save(best_state_dict, trainer_config['model_path'])
and in trainer.py, I save the best state dict result and return:
model.eval()
for dev_batch in val_iter:
with torch.no_grad():
# this self.val() runs model forward function and return prediction result.
dev_res = self.val(dev_batch, model, device)
dev_loss += dev_res['loss'].item()
# this function gets result metric.(which drops during evaluation.)
dev_metric = model.domain_biaffine._attachment_scores.get_metric(reset=True)
if dev_metric['UAS'] > best_UAS:
best_UAS = dev_metric['UAS']
best_res, best_state_dict = dev_metric, model.state_dict()
print("dev_loss: ", dev_loss / cnt_iter)
print("dev metric: ", dev_metric)
In evaluation.py, I just load the checkpoint.pt and make prediction:
test_dataset = FinetuningDataset(vocab=vocab, domains=domains, data_files=data_files, max_len=data_config["max_len"], giga_embedding_vocab=giga_embedding.word2id)
sp_collator = SortPadCollator(sort_key=lambda x:x[0], ignore_indics=[0])
test_iter = DataLoader(dataset=test_dataset,
batch_size=data_config["batch_size"],
shuffle=False,
collate_fn=sp_collator)
adatrans = AdaTrans(vocab=vocab, config=model_config, domain_size=len(domains))
model = MixLM(adatrans=adatrans, vocab=vocab, config=model_config, giga_embedding=giga_embedding)
# load pytorch checkpoint.pt
model.load_state_dict(torch.load(data_config['model_path'], torch.device('cuda:1')), strict=True)
trainer = ftTrainer(config=trainer_config, vocab=vocab, id2word=giga_embedding.id2word)
# this line makes prediction, which do model.forward and print metric(which is the same as the trainer.py snippet.)
trainer.inference(model=model, test_iter=test_iter, device=trainer_config['device'])
I have been searching for a long time on Google, but got nothing helpful. This totally bothering me. Could anyone help me with it? Thanks in advance!