5

I execute:

gcloud beta ml jobs submit training ${JOB_NAME} --config config.yaml

and after about 5 minutes the job errors out with this error:

Traceback (most recent call last): 
File "/usr/lib/python2.7/runpy.py", line 162, in _run_module_as_main "__main__", fname, loader, pkg_name) 
File "/usr/lib/python2.7/runpy.py", line 72, in _run_code exec code in run_globals 
File "/root/.local/lib/python2.7/site-packages/trainer/task.py", line 232, in <module> tf.app.run() 
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 30, in run sys.exit(main(sys.argv[:1] + flags_passthrough)) 
File "/root/.local/lib/python2.7/site-packages/trainer/task.py", line 228, in main run_training() 
File "/root/.local/lib/python2.7/site-packages/trainer/task.py", line 129, in run_training data_sets = input_data.read_data_sets(FLAGS.train_dir, FLAGS.fake_data) 
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/datasets/mnist.py", line 212, in read_data_sets with open(local_file, 'rb') as f: IOError: [Errno 2] No such file or directory: 'gs://my-bucket/mnist/train/train-images.gz'

The strange thing is, as far as I can tell, that file exists at that url.

Kevin Katzke
  • 3,581
  • 3
  • 38
  • 47
Amir Hormati
  • 329
  • 2
  • 6

2 Answers2

3

This error usually indicates you are using a multi-region GCS bucket for your output. To avoid this error you should use a regional GCS bucket. Regional buckets provide stronger consistency guarantees which are needed to avoid these types of errors.

For more information about properly setting up GCS buckets for Cloud ML please refer to the Cloud ML Docs

Amir Hormati
  • 329
  • 2
  • 6
1

Normal IO does not know how to deal with GCS gs:// correctly. You need:

first_data_file = args.train_files[0]
file_stream = file_io.FileIO(first_data_file, mode='r')

# run experiment
model.run_experiment(file_stream)

But ironically, you can move files from the gs://bucket to your root directory, which your programs CAN then actually see:

with file_io.FileIO(gs://presentation_mplstyle_path, mode='r') as input_f:
    with file_io.FileIO('presentation.mplstyle', mode='w+') as output_f:
        output_f.write(input_f.read())

mpl.pyplot.style.use(['./presentation.mplstyle'])

And finally, moving a file from your root back to a gs://bucket:

with file_io.FileIO(report_name, mode='r') as input_f:
    with file_io.FileIO(job_dir + '/' + report_name, mode='w+') as output_f:
        output_f.write(input_f.read())

Should be easier IMO.

Kim Miller
  • 886
  • 8
  • 11