How do I use the Embedding Projector included in Tensorboard?
I can't find any documentation for it. There are some references to it here, but there's no step-by-step example/tutorial on how to use it.
How do I use the Embedding Projector included in Tensorboard?
I can't find any documentation for it. There are some references to it here, but there's no step-by-step example/tutorial on how to use it.
As far as I am aware this is the only documentation about embedding visualization on the TensorFlow website. Though the code snippet might not be very instructive for the first time users, so here is an example usage:
import os
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
LOG_DIR = 'logs'
mnist = input_data.read_data_sets('MNIST_data')
images = tf.Variable(mnist.test.images, name='images')
with tf.Session() as sess:
saver = tf.train.Saver([images])
sess.run(images.initializer)
saver.save(sess, os.path.join(LOG_DIR, 'images.ckpt'))
Here first we create a TensoFlow variable (images
) and then save it using tf.train.Saver
. After executing the code we can launch TensorBoard by issuing tensorboard --logdir=logs
command and opening localhost:6006
in a browser.
However this visualisation is not very helpful because we do not see different classes to which each data point belongs. In order to distinguish each class from another one should provider some metadata:
import os
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
from tensorflow.contrib.tensorboard.plugins import projector
LOG_DIR = 'logs'
metadata = os.path.join(LOG_DIR, 'metadata.tsv')
mnist = input_data.read_data_sets('MNIST_data')
images = tf.Variable(mnist.test.images, name='images')
with open(metadata, 'w') as metadata_file:
for row in mnist.test.labels:
metadata_file.write('%d\n' % row)
with tf.Session() as sess:
saver = tf.train.Saver([images])
sess.run(images.initializer)
saver.save(sess, os.path.join(LOG_DIR, 'images.ckpt'))
config = projector.ProjectorConfig()
# One can add multiple embeddings.
embedding = config.embeddings.add()
embedding.tensor_name = images.name
# Link this tensor to its metadata file (e.g. labels).
embedding.metadata_path = metadata
# Saves a config file that TensorBoard will read during startup.
projector.visualize_embeddings(tf.summary.FileWriter(LOG_DIR), config)
Which gives us:
Sadly, I cannot find a more comprehensive documentation. Below I collect all related resources:
PS: Thanks for upvoting me. Now I can post all the links.
Now you can use Embedding Projector easily in Colab with PyTorch's SummaryWriter
import numpy as np
import tensorflow as tf
import tensorboard as tb
tf.io.gfile = tb.compat.tensorflow_stub.io.gfile
from torch.utils.tensorboard import SummaryWriter
vectors = np.array([[0,0,1], [0,1,0], [1,0,0], [1,1,1]])
metadata = ['001', '010', '100', '111'] # labels
writer = SummaryWriter()
writer.add_embedding(vectors, metadata)
writer.close()
%load_ext tensorboard
%tensorboard --logdir=runs
The %tensorboard magic now works properly again.
@Ehsan
Your explanation is very good. The key here is that every Variable has to be initialized before saver.save(...) call.
@Everyone
Also, tensorboard embedding is simply visualizing instances of saved Variable class. It doesn't care about whether it's words or images or anything else.
The official doc https://www.tensorflow.org/get_started/embedding_viz does not point out that it is a direction visualization of matrix, which in my opinion, introduced a lot of confusion.
Maybe you wonder what does it mean to visualize a matrix. A matrix can be interpreted as a collection of points in a space.
If I have a matrix with shape (100, 200), I can interpret it as a collection of 100 points, where each point has 200 dimension. In another words, 100 points in a 200 dimension space.
In the word2vec case, we have 100 words where each word is represented with a 200 length vector. Tensorboard embedding simply uses PCA or T-SNE to visualize this collection(matrix).
Therefore, you can through any random matrices. If you through an image with shape (1080, 1920), it will visualize each row of this image as if it's a single point.
That been said, you can visualize the embedding of any Variable class instances by simply saving then
saver = tf.train.Saver([a, _list, of, wanted, variables])
...some code you may or may not have...
saver.save(sess, os.path.join(LOG_DIR, 'filename.ckpt'))
I will try to make a detailed tutorial later.
It sounds like you want to get the Visualization section with t-SNE running on TensorBoard. As you've described, the API of Tensorflow has only provided the bare essential commands in the how-to document.
I’ve uploaded my working solution with the MNIST dataset to my GitHub repo.
Original Stackoverflow answer: TensorBoard Embedding Example?