0

I'm trying to train a YOLO model with the PASCAL VOC dataset using Tensorflow on Google Cloud ML Engine but my training job keeps hanging at "Restoring parameters from /root.../yolo_tiny.ckpt".

Is there a reason for this? I've been waiting for the past 4-5 hours.

Logs

Steven Chen
  • 397
  • 1
  • 6
  • 19
  • Hi Steven, did you include the checkpoint file in the trainer package that was submitted as part of your training job? From the logs, it seems like TensorFlow was trying to read the checkpoint file from where your trainer package was installed. Also, did your job only have master (single replica)? You can share the project number and job id via cloudml-feedback@google.com so that we can take a closer look. – Guoqing Xu Oct 25 '17 at 09:06
  • @GuoqingXu Yes, the job only has a master (single replica). I also just sent an email over with more detailed information. Thanks! – Steven Chen Oct 25 '17 at 17:04

0 Answers0