I created a VM instance in GCP with Pytorch XLA environment.
And I created a TPU-VM with tpu-vm-pt-2.0
.
I SSHed into the VM instance and activated the conda environment with pytorch-xla. But, when I try to test a sample script to test for TPU, returns an error as follows:
\2023-04-17 19:35:38.550666: F 5184 tensorflow/compiler/xla/xla_client/xrt_computation_client.cc:1362\] Non-OK-status: session.Run({tensorflow::Output(result, 0)}, &outputs) status: UNIMPLEMENTED: method "RunStep" not implemented
\*\*\* Begin stack trace \*\*\*
tsl::CurrentStackTrace()
xla::XrtComputationClient::InitializeAndFetchTopology(std::string const&, int, std::string const&, tensorflow::ConfigProto const&)
xla::XrtComputationClient::InitializeDevices(std::unique_ptr\<tensorflow::tpu::TopologyProto, std::default_delete\<tensorflow::tpu::TopologyProto\> \>)
xla::XrtComputationClient::XrtComputationClient(xla::XrtComputationClient::Options, std::unique_ptr\<tensorflow::tpu::TopologyProto, std::default_delete\<tensorflow::tpu::TopologyProto\> \>)
xla::ComputationClient::Create()
xla::ComputationClient::Get()
PyCFunction_Call
_PyObject_MakeTpCall
_PyEval_EvalFrameDefault
_PyFunction_Vectorcall
_PyObject_GenericGetAttrWithDict
_PyEval_EvalFrameDefault
_PyEval_EvalCodeWithName
_PyFunction_Vectorcall
_PyEval_EvalCodeWithName
PyEval_EvalCode
PyRun_SimpleFileExFlags
Py_BytesMain
__libc_start_main
End stack trace
Aborted\
Can someone help me debug?
I tried the quickstart guides and the pytorch tutorials from the documentations, but I don't know what I am doing wrong. For instance, I also tried with both my VM instance and TPU instance with the same zone but still the error. I tried running the code as XRT_TPU_CONFIG="tpu_worker;0;{IP_ADDRESS}:8470" python test.py
too, but still the error.