Pytorch save/load model with lower dev set accuracy?

Question

Here is the question: I am loading my pytorch model(best result on dev set while training) from the checkpoint file during model evaluation, remembering to do model.eval() and with torch.no_grad(), I still get a lower accuracy result(with 1-2% drop) on dev set compared with which I get while training.

I have tried:

printing the state dict before pytorch save the best result model during training, compared with what I get while loading, which is the same.
check my code, which use lots of dropout and layernorm layers, and get no error.
load model on the same GPU but nothing helpful.

My working environment:

Python 3.6.10, Pytorch 1.7.1(with cuda 11.1)
GPU: NVIDIA 2080Ti
use the same seed(numpy and pytorch) during training and evaluation
use model.eval() and with torch.no_grad() on dev set during both model training and evaluating.
the same dev set and the same metric calculation method.

Here is my pseudocode during training(the original one is too heavy):

# load my data.
train_dataset = FinetuningDataset(vocab=vocab, domains=domains, data_files=data_files, max_len=data_config["max_len"], giga_embedding_vocab=giga_embedding.word2id)

val_dataset = FinetuningDataset(vocab, domains=domains, data_files=dev_data_path, max_len=data_config['max_len'], giga_embedding_vocab=giga_embedding.word2id)

sp_collator = SortPadCollator(sort_key=lambda x:x[0], ignore_indics=[0])   
train_iter = DataLoader(dataset=train_dataset,  
                        batch_size=data_config["batch_size"], 
                        shuffle=data_config["shuffle"],
                        collate_fn=sp_collator)
val_iter = DataLoader(dataset=val_dataset,  
                    batch_size=data_config["batch_size"], 
                    shuffle=data_config["shuffle"], 
                    collate_fn=sp_collator)
adatrans = AdaTrans(vocab=vocab, config=model_config, domain_size=len(domains))
adatrans.load_state_dict(torch.load('ckpt_adatrans/litebert_1e-3_50cls_cuda2.pt'))
model = MixLM(adatrans=adatrans, vocab=vocab, config=model_config, giga_embedding=giga_embedding)

# this is my loss function during training.
loss_fn_dct = {"mask_loss": neg_log_likelihood_loss, "emb_mse_loss":nn.MSELoss(reduction='none'), "domain_cls_loss":nn.NLLLoss(reduction='none')}
metrics_fn_dct = {"mask_metrics":accuracy}

# build a trainer.
trainer = ftTrainer(loss_fn_dct=loss_fn_dct, metrics_fn_dct=metrics_fn_dct, config=trainer_config)
# gets best result on dev set and save it to checkpoint.pt
best_res, best_state_dict = trainer.train(model=model, train_iter=train_iter, val_iter=val_iter, optimizer=trainer_config['optimizer'], device=trainer_config['device'])
print("best result:: ", best_res)
trainer.save(best_state_dict, trainer_config['model_path'])

and in trainer.py, I save the best state dict result and return:

model.eval()
for dev_batch in val_iter:
    with torch.no_grad():
      # this self.val() runs model forward function and return prediction result.
      dev_res = self.val(dev_batch, model, device)
      dev_loss += dev_res['loss'].item()
# this function gets result metric.(which drops during evaluation.)
dev_metric = model.domain_biaffine._attachment_scores.get_metric(reset=True)
if dev_metric['UAS'] > best_UAS:
    best_UAS = dev_metric['UAS']
    best_res, best_state_dict = dev_metric, model.state_dict()

print("dev_loss: ", dev_loss / cnt_iter)
print("dev metric: ", dev_metric)

In evaluation.py, I just load the checkpoint.pt and make prediction:

test_dataset = FinetuningDataset(vocab=vocab, domains=domains, data_files=data_files, max_len=data_config["max_len"], giga_embedding_vocab=giga_embedding.word2id)

sp_collator = SortPadCollator(sort_key=lambda x:x[0], ignore_indics=[0])   

test_iter = DataLoader(dataset=test_dataset,  
                        batch_size=data_config["batch_size"], 
                        shuffle=False,
                        collate_fn=sp_collator)

adatrans = AdaTrans(vocab=vocab, config=model_config, domain_size=len(domains))
model = MixLM(adatrans=adatrans, vocab=vocab, config=model_config, giga_embedding=giga_embedding)

# load pytorch checkpoint.pt
model.load_state_dict(torch.load(data_config['model_path'], torch.device('cuda:1')), strict=True)

trainer = ftTrainer(config=trainer_config, vocab=vocab, id2word=giga_embedding.id2word)
# this line makes prediction, which do model.forward and print metric(which is the same as the trainer.py snippet.)
trainer.inference(model=model, test_iter=test_iter, device=trainer_config['device'])

I have been searching for a long time on Google, but got nothing helpful. This totally bothering me. Could anyone help me with it? Thanks in advance!

Pytorch save/load model with lower dev set accuracy?

0 Answers0