How to use trained BERT model checkpoints for prediction?

Question

I trained the BERT with SQUAD 2.0 and got the model.ckpt.data, model.ckpt.meta, model.ckpt.index (F1 score : 81) in the output directory along with predictions.json, etc. using the BERT-master/run_squad.py

python run_squad.py \
  --vocab_file=$BERT_LARGE_DIR/vocab.txt \
  --bert_config_file=$BERT_LARGE_DIR/bert_config.json \
  --init_checkpoint=$BERT_LARGE_DIR/bert_model.ckpt \
  --do_train=True \
  --train_file=$SQUAD_DIR/train-v2.0.json \
  --do_predict=True \
  --predict_file=$SQUAD_DIR/dev-v2.0.json \
  --train_batch_size=24 \
  --learning_rate=3e-5 \
  --num_train_epochs=2.0 \
  --max_seq_length=384 \
  --doc_stride=128 \
  --output_dir=gs://some_bucket/squad_large/ \
  --use_tpu=True \
  --tpu_name=$TPU_NAME \
  --version_2_with_negative=True

I tried to copy the model.ckpt.meta, model.ckpt.index, model.ckpt.data to the $BERT_LARGE_DIR directory and changed the run_squad.py flags as follows to only predict the answer and not train using a dataset:

python run_squad.py \
  --vocab_file=$BERT_LARGE_DIR/vocab.txt \
  --bert_config_file=$BERT_LARGE_DIR/bert_config.json \
  --init_checkpoint=$BERT_LARGE_DIR/model.ckpt \
  --do_train=False \
  --train_file=$SQUAD_DIR/train-v2.0.json \
  --do_predict=True \
  --predict_file=$SQUAD_DIR/dev-v2.0.json \
  --train_batch_size=24 \
  --learning_rate=3e-5 \
  --num_train_epochs=2.0 \
  --max_seq_length=384 \
  --doc_stride=128 \
  --output_dir=gs://some_bucket/squad_large/ \
  --use_tpu=True \
  --tpu_name=$TPU_NAME \
  --version_2_with_negative=True

It throws bucket directory/model.ckpt does not exist error.

How to utilize the checkpoints generated after training and use it for prediction?

Ashwin Geet D'Sa · Accepted Answer · 2019-10-30T10:04:54.683

Usually, the trained checkpoints are created in the directory specified by --output_dir parameter while training. (Which is gs://some_bucket/squad_large/ in your case). Every checkpoint will have a number. You have to identify the biggest number; example: model.ckpt-12345. Now, set the --init_checkpoint parameter in your evaluation/prediction, using the output directory and the last saved checkpoint (The model with the highest number). (In your case, it shall be something like --init_checkpoint=gs://some_bucket/squad_large/model.ckpt-<highest number>)

score 0 · Answer 2 · answered Jun 28 '19 at 20:39

0

In the second code the FLAG init_checkpoint I think it should be:

--init_checkpoint=$BERT_LARGE_DIR/bert_model.ckpt

as in the one above, and not --init_checkpoint=$BERT_LARGE_DIR/model.ckpt.

If the problem persist, are you using the multi_cased_L-12_H-768_A-12 pre-trained models?

answered Jun 28 '19 at 20:39

Giorgio

23
5

I am using cased_L-24_H-1024_A-16 pre-trained model. I will let you know the results. – Jeeva Bharathi Jul 01 '19 at 04:30
This did not load the trained model but the pre-trained model. Other answer worked. To use trained model, we have to specify the checkpoint number. – Jeeva Bharathi Jul 01 '19 at 06:11

How to use trained BERT model checkpoints for prediction?

2 Answers2