2

I am using a preemptible TPUv3-8 node with a GCE VM and I am having some difficulty restarting the TPU node after it has been preempted.

On the TPUs page, it shows that the TPU-node has been preempted. TPU node has been preempted

But when I try to start it back it shows that its not in stopped or preempted state. Why is this happening and what should I do to fix it. enter image description here

I would also love to know if there is way to auto-restart the TPU node and run a simple startup script. Thank you

Joy
  • 41
  • 5

1 Answers1

2

This behavior is expected.

In Preemptible TPUs documentation you have information on how you can create TPU Preemptible nodes, best practice, like described here.

However in the bottom of the Detecting if a TPU has been preempted part, you have NOTE.

Note: If your Cloud TPU is preempted, you must delete it and create a new one as described in Managing TPUs.

In short, if the TPU VM was preempted, you cannot restart it. You must delete it and create a new one.

Regarding auto-restart the TPU node, there is only option mentioned in Preemptible VMs and TPUs (TPU Nodes only).

Note that the preemptible status of the TPU is independent of the preemptible status of the VM. You can define your TPU as preemptible and the VM as not preemptible, or the other way round. You can also define them both as preemptible.

The most likely combination is a preemptible TPU and a non-preemptible VM.

PjoterS
  • 12,841
  • 1
  • 22
  • 54