I'm getting like 99.7% TPU idle time with my training code (https://github.com/ksjae/KoGPT2-train). What are the general methods used to reducing IDLE time? How can I(or any user in general) reduce it to a sane amount?
How can I find the culprit of long idle time?
*data available at gs://kogpt2/model