I am trying to use TPU on GCP with tensorflow 2.1 with Keras API. Unfortunately, I am stuck after creating the tpu-node. In fact, it seems that my VM "see" the tpu, but could not connect to it.
The code I am using :
resolver = tf.distribute.cluster_resolver.TPUClusterResolver(TPU_name)
print('Running on TPU ', resolver.master())
tf.config.experimental_connect_to_cluster(resolver)
tf.tpu.experimental.initialize_tpu_system(resolver)
strategy = tf.distribute.experimental.TPUStrategy(resolver)
The code is stuck line 3, I received few messages and then nothing, so I do not know what could be the issue. Therefore I am suspecting some connection's issue between the VM and the TPU.
The message :
2020-04-22 15:46:25.383775: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA 2020-04-22 15:46:25.992977: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2300000000 Hz 2020-04-22 15:46:26.042269: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5636e4947610 initialized for platform Host (this does not guarantee that XLA will be used). Devices: 2020-04-22 15:46:26.042403: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version 2020-04-22 15:46:26.080879: I tensorflow/core/common_runtime/process_util.cc:147] Creating new thread pool with default inter op setting: 2. Tune using inter_op_parallelism_threads for best performance. E0422 15:46:26.263937297 2263 socket_utils_common_posix.cc:198] check for SO_REUSEPORT: {"created":"@1587570386.263923266","description":"SO_REUSEPORT unavailable on compiling system","file":"external/grpc/src/core/lib/iomgr/socket_utils_common_posix.cc","file_line":166} 2020-04-22 15:46:26.269134: I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:300] Initialize GrpcChannelCache for job worker -> {0 -> 10.163.38.90:8470} 2020-04-22 15:46:26.269192: I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:300] Initialize GrpcChannelCache for job localhost -> {0 -> localhost:32263}
Moreover, I am using the "Deep Learning" Image from gcp, so I should not need to install anything, right ?
Does anyone have the same issue with TF 2.1 ? P.S : the same code works fine on Kaggle and Colab.