I'm facing some problems while trying to fit my model using TPU on kaggle.
Tpu already's initialized:
try:
tpu = tf.distribute.cluster_resolver.TPUClusterResolver()
print(f'Running on TPU {tpu.master()}')
except ValueError:
tpu = None
if tpu:
tf.config.experimental_connect_to_cluster(tpu)
tf.tpu.experimental.initialize_tpu_system(tpu)
strategy = tf.distribute.experimental.TPUStrategy(tpu)
else:
strategy = tf.distribute.get_strategy()
AUTO = tf.data.experimental.AUTOTUNE
REPLICAS = strategy.num_replicas_in_sync
print(f'REPLICAS: {REPLICAS}')
But when i try to fit my model, this error is raised:
{{function_node __inference_train_function_64094}} failed to connect to all addresses
GRPC error information:{"created":"@1609444822.190891136","description":"Failed to pick
subchannel","file":"third_party/grpc/src/core/ext/filters/client_channel/client_channel.cc",
file_line":3959,"referenced_errors": [{"created":"@1609444822.190889693"
,"description":"failed to connect to all addresses", […]
[[{{node MultiDeviceIteratorGetNextFromShard}}]] [[RemoteCall][[IteratorGetNextAsOptional]]