How to use tensorboard Embedding Projector?

Question

How do I use the Embedding Projector included in Tensorboard?

I can't find any documentation for it. There are some references to it here, but there's no step-by-step example/tutorial on how to use it.

This question is off-topic; [questions asking only for recommendations of tutorials or other off-site resources are off-topic for Stack Overflow](//meta.stackoverflow.com/q/251134/2747593). Instead, start writing code, and come back when you have a more specific problem. Be sure to show us [what you have tried](http://whathaveyoutried.com) and include a [Minimal, Complete, Verifiable Example](//stackoverflow.com/help/mcve). — Scott Weldon, Dec 01 '16 at 00:53
I believe this question must be still useful to many tensorflow users. — EyesBear, Jul 05 '19 at 13:08
I think this is a wonderful question and should not have been closed. There is very little documentation on this thing so how do you expect the original poster to have come up with some initial code. This answer https://stackoverflow.com/a/42775951/1286165 sheds some light but that one still misses actual code. — 183.amir, Jul 18 '19 at 10:36

score 70 · Answer 1 · answered Dec 28 '16 at 22:27

As far as I am aware this is the only documentation about embedding visualization on the TensorFlow website. Though the code snippet might not be very instructive for the first time users, so here is an example usage:

import os
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data

LOG_DIR = 'logs'

mnist = input_data.read_data_sets('MNIST_data')
images = tf.Variable(mnist.test.images, name='images')

with tf.Session() as sess:
    saver = tf.train.Saver([images])

    sess.run(images.initializer)
    saver.save(sess, os.path.join(LOG_DIR, 'images.ckpt'))

Here first we create a TensoFlow variable (images) and then save it using tf.train.Saver. After executing the code we can launch TensorBoard by issuing tensorboard --logdir=logs command and opening localhost:6006 in a browser.

However this visualisation is not very helpful because we do not see different classes to which each data point belongs. In order to distinguish each class from another one should provider some metadata:

import os
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
from tensorflow.contrib.tensorboard.plugins import projector


LOG_DIR = 'logs'
metadata = os.path.join(LOG_DIR, 'metadata.tsv')

mnist = input_data.read_data_sets('MNIST_data')
images = tf.Variable(mnist.test.images, name='images')

with open(metadata, 'w') as metadata_file:
    for row in mnist.test.labels:
        metadata_file.write('%d\n' % row)

with tf.Session() as sess:
    saver = tf.train.Saver([images])

    sess.run(images.initializer)
    saver.save(sess, os.path.join(LOG_DIR, 'images.ckpt'))

    config = projector.ProjectorConfig()
    # One can add multiple embeddings.
    embedding = config.embeddings.add()
    embedding.tensor_name = images.name
    # Link this tensor to its metadata file (e.g. labels).
    embedding.metadata_path = metadata
    # Saves a config file that TensorBoard will read during startup.
    projector.visualize_embeddings(tf.summary.FileWriter(LOG_DIR), config)

Which gives us:

What does Total variance described: 26.4 % means. With this value can one say data features defined is good or bad? — Wazy, Dec 14 '17 at 15:27
What is the variable metadata's format? Is it a dictionary with label as keys, and data as value? — Junning Huang, Dec 14 '18 at 09:38
`embedding.metadata_path` should be `os.path.basename(metadata)` — oezguensi, Jun 20 '19 at 23:28
@Wazy, I am sure you have found out by now, but the total variance explained is computed by taking the sum of the eigenvalues from the eigenvectors used over the sum of all the eigenvalues. You want a higher value. A lower value means that variance along some major axis is projected to a flat surface and is not accounted for in the lower embedded visualization. — leonard, Oct 15 '19 at 15:45

score 41 · Answer 2 · edited Feb 24 '20 at 07:00

Sadly, I cannot find a more comprehensive documentation. Below I collect all related resources:

How-to : https://github.com/efeiefei/tensorflow_documents_zh/blob/master/get_started/embedding_viz.md
Google Research Blog: announcement and animation
Paper : https://arxiv.org/pdf/1611.05469v1.pdf
Source : https://github.com/tensorflow/embedding-projector-standalone
2017 TF Dev Summit tutorial and code
Issue #6322 has some pointers and examples

PS: Thanks for upvoting me. Now I can post all the links.

Update 2019-08

Now you can use Embedding Projector easily in Colab with PyTorch's SummaryWriter

import numpy as np
import tensorflow as tf
import tensorboard as tb
tf.io.gfile = tb.compat.tensorflow_stub.io.gfile
from torch.utils.tensorboard import SummaryWriter

vectors = np.array([[0,0,1], [0,1,0], [1,0,0], [1,1,1]])
metadata = ['001', '010', '100', '111']  # labels
writer = SummaryWriter()
writer.add_embedding(vectors, metadata)
writer.close()
%load_ext tensorboard
%tensorboard --logdir=runs

Update 2020-02

The %tensorboard magic now works properly again.

Not your library I know, but I wonder why this requires tensorflow. Surely if you already have embeddings a client-only solution would be preferable? — geotheory, Aug 13 '21 at 15:49
This is very good minimalist answer. I have one question though, how can we add the names of the run and of the tensor? — John Smith, Dec 05 '22 at 13:40

Albert X.W. · Answer 3 · 2017-03-14T00:52:29.547

@Ehsan
Your explanation is very good. The key here is that every Variable has to be initialized before saver.save(...) call.

@Everyone
Also, tensorboard embedding is simply visualizing instances of saved Variable class. It doesn't care about whether it's words or images or anything else.

The official doc https://www.tensorflow.org/get_started/embedding_viz does not point out that it is a direction visualization of matrix, which in my opinion, introduced a lot of confusion.

Maybe you wonder what does it mean to visualize a matrix. A matrix can be interpreted as a collection of points in a space.

If I have a matrix with shape (100, 200), I can interpret it as a collection of 100 points, where each point has 200 dimension. In another words, 100 points in a 200 dimension space.

In the word2vec case, we have 100 words where each word is represented with a 200 length vector. Tensorboard embedding simply uses PCA or T-SNE to visualize this collection(matrix).

Therefore, you can through any random matrices. If you through an image with shape (1080, 1920), it will visualize each row of this image as if it's a single point.

That been said, you can visualize the embedding of any Variable class instances by simply saving then saver = tf.train.Saver([a, _list, of, wanted, variables]) ...some code you may or may not have... saver.save(sess, os.path.join(LOG_DIR, 'filename.ckpt'))

I will try to make a detailed tutorial later.

score 4 · Answer 4 · edited May 23 '17 at 12:09

4

It sounds like you want to get the Visualization section with t-SNE running on TensorBoard. As you've described, the API of Tensorflow has only provided the bare essential commands in the how-to document.

I’ve uploaded my working solution with the MNIST dataset to my GitHub repo.

Original Stackoverflow answer: TensorBoard Embedding Example?

edited May 23 '17 at 12:09

Community

1
1

answered Feb 14 '17 at 21:53

norman_h

911
9
18

How to use tensorboard Embedding Projector?

4 Answers4

Update 2019-08

Update 2020-02

Linked