11

I'm trying to access the Tensorboard for the tensorflow_resnet_cifar10_with_tensorboard example, but not sure what the url should be, the help text gives 2 options:

You can access TensorBoard locally at http://localhost:6006 or using your SageMaker notebook instance proxy/6006/(TensorBoard will not work if forget to put the slash, '/', in end of the url). If TensorBoard started on a different port, adjust these URLs to match.

When it says access locally, does that mean the local container Sagemaker creates in AWS? If so, how do I get there?

Or if I use run_tensorboard_locally=False, what should the proxy url be?

WBC
  • 1,854
  • 4
  • 21
  • 34

4 Answers4

19

Here is my solution:

If URL of my sagemaker notebook instance is:

https://myinstance.notebook.us-east-1.sagemaker.aws/notebooks/image_classify.ipynb

And URL of accessing TensorBoard will be:

https://myinstance.notebook.us-east-1.sagemaker.aws/proxy/6006/
derHugo
  • 83,094
  • 9
  • 75
  • 115
T.C. Liu
  • 301
  • 2
  • 3
3

You can access TensorBoard on your notebook using the link "proxy/6006".

If you set run_tensorboard_locally=False then it won't start TensorBoard.

If the URL you clicked gives you the error "[Errno 111] Connection refused" then it seems that training has already stopped. According to https://github.com/aws/sagemaker-python-sdk it "terminates TensorBoard when the execution ends" so it seems you have to access it during the training step only.

1

"Local" there refers to the machine which is running the estimator.fit method. So if you are running the example notebook on a SageMaker notebook instance, tensorboard will be running on that machine.

The "proxy/6006" part of the text you quoted is a clickable link which will bring up TensorBoard on your notebook. The full URL will be "https://.notebook..sagemaker.aws/proxy/6006/".

  • 2
    I think I'm missing something simple, I don't see any clickable link and that URL gives me `[Errno 111] Connection refused`. Would you mind sharing a screenshot of the link for the cifar10 example? – WBC Jan 09 '18 at 23:00
  • 1
    I'm having the same issue – Omri Bahat Treidel May 25 '18 at 02:11
0

You can find a more detailed tutorial here: https://docs.aws.amazon.com/sagemaker/latest/dg/studio-tensorboard.html

You can save your logs like this:

LOG_DIR = os.path.join(os.getcwd(), "logs/fit/" + datetime.datetime.now().strftime("%Y%m%d-%H%M%S"))

EFS_PATH_LOG_DIR = "/".join(LOG_DIR.strip("/").split('/')[1:-1])

Then lunch Tensorboard by following these steps: Open a new Terminal. Install Tensorboard and launch it (Copy EFS_PATH_LOG_DIR from the Jupyter notebook):

pip install tensorboard 
tensorboard --logdir <EFS_PATH_LOG_DIR>

Open Tensorboard: https://<YOUR_Notebook_URL>.studio.region.sagemaker.aws/jupyter/default/proxy/6006/

If you store your logs in an S3 you can luanch it again from Terminal by doing:

AWS_REGION=region tensorboard --logdir s3://bucket_name/logs/

and then again going to the same url: https://<YOUR_Notebook_URL>.studio.region.sagemaker.aws/jupyter/default/proxy/6006/

cndv
  • 507
  • 2
  • 13
  • 26