1

I am running a Google Colab notebook and am trying to capture TPU profiling data for use in TensorBoard, however I can't get capture_tpu_profile to run in the background while running my TensorFlow code.

So far I tried to run the capture process in the background with:

!capture_tpu_profile --logdir=gs://<my_logdir> --tpu=$COLAB_TPU_ADDR &

and

!bg capture_tpu_profile --logdir=gs://<my_logdir> --tpu=$COLAB_TPU_ADDR
Bob Smith
  • 36,107
  • 11
  • 98
  • 91
Jann
  • 1,799
  • 3
  • 21
  • 38

2 Answers2

5

Turns out a way to do this is to start the process from python directly like this (I also had to modify the parameter from --tpu to --service_addr):

import subprocess
subprocess.Popen(["capture_tpu_profile","--logdir=gs://<my_logdir>", "--service_addr={}".format(os.environ['COLAB_TPU_ADDR'])])

the check=True makes the command raise an Exception if it fails.

Jann
  • 1,799
  • 3
  • 21
  • 38
  • Did this actually work for you on Google Colaboratory? I have been trying everything and I cannot get it to work. It just results in not capturing any data, especially the events file being empty etc. (e.g. the file name would end with ".profile-empty") – Michielver Nov 17 '19 at 07:59
  • It did not work for me. The logdir requires a google storage bucket and giving the right access rights to the TPU pod that writes the logfiles. When this is done, the logging is activated, but it does not run asynchronously, i.e. you can't run the profiler and your model at the same time. If you run "!capture_tpu_profile --monitoring_level 2 --tpu $TPU_NAME", the profiler runs. The problem is to get it to run in the background. – Emil Jun 02 '20 at 11:05
0

One way to do this is to use the TPUProfilerHook

https://github.com/tensorflow/tpu/blob/master/models/common/tpu_profiler_hook.py

Which runs the profiler as a session hook.

Example here https://github.com/tensorflow/tpu/blob/5d838047af0163bdf7b97b9404648dc2961c4b63/models/official/resnet/resnet_main.py#L699

michaelb
  • 252
  • 1
  • 6