0

In GCP ai-platform I am trying to write simple logs to a file in addition to saving a tf.keras model. However, saving the model with tf.saved_model.save works, while writing to a .txt using with open(file) as out: does not and raises this:

FileNotFoundError: [Errno 2] No such file or directory: 'gs://my-test-bucket-001/keras-job-dir/mnist_model_export/results.txt'

Can anyone explain what is the difference in how ai-platform discovers file paths?

My request essentially looks like this (see https://cloud.google.com/ai-platform/docs/getting-started-keras)

...
JOB_DIR = gs://my-test-bucket-001/keras-job-dir
gcloud ai-platform jobs submit training $JOB_NAME \ 
 --package-path trainer/  \
 --module-name trainer.task  \
 --region $REGION  \
 --python-version 3.7  \
 --runtime-version 2.1  \
 --job-dir $JOB_DIR  \
 --stream-logs

The relevant part of trainer/task.py script is this:

   # use this path to save outputs
   export_path = os.path.join(args.job_dir, 'mnist_model_export')
   # this works
   tf.saved_model.save(mnist_model, export_path)

   # this fails when included
   with open(os.path.join(export_path, 'results.txt'), 'a+') as out:
      log_str = "Job finished! {}\n".format(time.strftime('%Y-%m-%d %H:%M:%S'))
      out.write(log_str)
user71111
  • 1
  • 1

1 Answers1

0

When you use with open(os.path.join(export_path, 'results.txt'), 'a+') as out: you are using the local file system and since you pass in export_path a gs:// path, it returns file does not exists, since the gs:// path is not available locally. You need to use a file handler that supports reading/writing into GCS buckets. For example FileIO

Replace:

 with open(os.path.join(export_path, 'results.txt'), 'a+') as out:
      log_str = "Job finished! {}\n".format(time.strftime('%Y-%m-%d %H:%M:%S'))
      out.write(log_str)

With:

from tensorflow.python.lib.io import file_io
with file_io.FileIO(os.path.join(export_path, 'results.txt'), mode='a+') as out:
      log_str = "Job finished! {}\n".format(time.strftime('%Y-%m-%d %H:%M:%S'))
      out.write(log_str)

You may want to check out other Logging options available.

gogasca
  • 9,283
  • 6
  • 80
  • 125
  • thank you for the reply. Unfortunately, it doesn't help in my case. ERROR 2020-03-13 17:59:42 -0400 service tensorflow.python.framework.errors_impl.NotFoundError: Error executing an HTTP request: HTTP response code 404 with body 'NoSuchKeyThe specified key does not exist.
    No such object: my-test-bucket/keras-job-dir/mnist_model_export/results.txt
    '
    – user71111 Mar 13 '20 at 22:01
  • basic question, are you passing: gs://my-test-bucket/keras-job-dir/mnist_model_export/results.txt or my-test-bucket/keras-job-dir/mnist_model_export/results.txt as filename? – gogasca Mar 13 '20 at 22:49
  • I do have 'gs://' in front, but the parser ignores it when outputs an error message. So, the result of print below is 'gs://my-test-bucket/keras-job-dir/v2w_var_....', and then writing fails with same error. `export_path = os.path.join(args.job_dir, 'v2w_var_{}'.format(time.strftime('%Y-%m-%d_%H-%M-%S')))` >>>`tf.saved_model.save(mod_var, export_path)` >>>`print(export_path)` >>>`with file_io.FileIO(os.path.join(export_path, 'results.txt'), mode='a+') as out:` >>>`log_str = "Job finished! {}\n".format(time.strftime('%Y-%m-%d %H:%M:%S'))` >>>`out.write(log_str)` Thank you! – user71111 Mar 15 '20 at 17:07
  • Are you still stuck on this problem? I can try to repro tomorrow – gogasca Mar 31 '20 at 04:27
  • Thx for getting back! I ended up only saving a model with `tf.save` . I am still interested in knowing the solution – user71111 Apr 02 '20 at 01:56