1

I quite new to Google Cloud Platform and I am trying to train a model with TPU. I follow this tutorial to set up the TPU with Google Colab. All the code below follows the tutorial.

This is the step I have done:

import datetime
import json
import os
import pprint
import random
import string
import sys
import tensorflow as tf

assert 'COLAB_TPU_ADDR' in os.environ, 'ERROR: Not connected to a TPU runtime; please see the first cell in this notebook for instructions!'
TPU_ADDRESS = 'grpc://' + os.environ['COLAB_TPU_ADDR']
print('TPU address is => ', TPU_ADDRESS)

from google.colab import auth
auth.authenticate_user()
with tf.Session(TPU_ADDRESS) as session:
  print('TPU devices:')
  pprint.pprint(session.list_devices())

  # Upload credentials to TPU.
  with open('/content/adc.json', 'r') as f:
    auth_info = json.load(f)
  tf.contrib.cloud.configure_gcs(session, credentials=auth_info)
  # Now credentials are set for all future sessions on this TPU.

Output:

TPU address is =>  grpc://10.4.89.154:8470

Provide my BUCKET name and OUPUT DIRECTORY name:

BUCKET = 'my_xlnet' #@param {type:"string"}
assert BUCKET, '*** Must specify an existing GCS bucket name ***'
output_dir_name = 'xlnet_output' #@param {type:"string"}
BUCKET_NAME = 'gs://{}'.format(BUCKET)
OUTPUT_DIR = 'gs://{}/{}'.format(BUCKET,output_dir_name)
tf.gfile.MakeDirs(OUTPUT_DIR)
print('***** Model output directory: {} *****'.format(OUTPUT_DIR))

Move the pretrained model to GCS bucket:

!gsutil mv /content/xlnet_extension_tf/model/xlnet_cased_L-24_H-1024_A-16 $BUCKET_NAME

Output:

...
Operation completed over 5 objects/1.3 GiB.   

Then run the main code:

!python /content/xlnet_extension_tf/run_coqa.py \
--use_tpu=True \
--tpu_name=grpc://10.4.89.154:8470 \
--spiece_model_file=$BUCKET_NAME/xlnet_cased_L-24_H-1024_A-16/spiece.model \
--model_config_path=$BUCKET_NAME/xlnet_cased_L-24_H-1024_A-16/xlnet_config.json \
--init_checkpoint=$BUCKET_NAME/xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt \
...

Then I got this error:

OSError: Not found: "gs://my_xlnet/xlnet_cased_L-24_H-1024_A-16/spiece.model": No such file or directory Error #2

This is the GCS bucket screen: enter image description here

I don't know why this error exists because I can move my pretrained model to the bucket successfully.

Do you guys know how to fix this?

Update:

The run_coqa.py file: https://github.com/stevezheng23/xlnet_extension_tf/blob/master/run_coqa.py

huy
  • 1,648
  • 3
  • 14
  • 40

2 Answers2

0

Can you post the part where run_coqa.py is opening the file?

It seems like you're trying to open it with a regular os. command where you should be using GCP's sdk.

aldarisbm
  • 417
  • 3
  • 8
  • I have just updated my post with the link of the `run_coqa.py` file. – huy Aug 08 '20 at 04:18
  • 1
    I'm a bit out of my depth here, but this seems relevant. It seems that you need a specific implementation to be able to open `gs://` files. Check response [here](https://stackoverflow.com/questions/45585104/save-keras-modelcheckpoints-in-google-cloud-bucket) – aldarisbm Aug 08 '20 at 05:03
  • huy: Did aldarisbm's suggestion help you? – MrTech Aug 19 '20 at 22:38
0

This tutorial was created by a third party. I cannot see any common issue going on right now that would stop this code from running.

MrTech
  • 430
  • 4
  • 8