22

I'm looking for a tensorboard embedding example, with iris data for example like the embedding projector http://projector.tensorflow.org/

But unfortunately i couldn't find one. Just a little bit information about how to do it in https://www.tensorflow.org/how_tos/embedding_viz/

Does someone knows a basic tutorial for this functionality?

Basics:

1) Setup a 2D tensor variable(s) that holds your embedding(s).

embedding_var = tf.Variable(....)

2) Periodically save your embeddings in a LOG_DIR.

3) Associate metadata with your embedding.

Patrick
  • 684
  • 1
  • 9
  • 18

6 Answers6

21

I've used FastText's pre-trained word vectors with TensorBoard.

import os
import tensorflow as tf
import numpy as np
import fasttext
from tensorflow.contrib.tensorboard.plugins import projector

# load model
word2vec = fasttext.load_model('wiki.en.bin')

# create a list of vectors
embedding = np.empty((len(word2vec.words), word2vec.dim), dtype=np.float32)
for i, word in enumerate(word2vec.words):
    embedding[i] = word2vec[word]

# setup a TensorFlow session
tf.reset_default_graph()
sess = tf.InteractiveSession()
X = tf.Variable([0.0], name='embedding')
place = tf.placeholder(tf.float32, shape=embedding.shape)
set_x = tf.assign(X, place, validate_shape=False)
sess.run(tf.global_variables_initializer())
sess.run(set_x, feed_dict={place: embedding})

# write labels
with open('log/metadata.tsv', 'w') as f:
    for word in word2vec.words:
        f.write(word + '\n')

# create a TensorFlow summary writer
summary_writer = tf.summary.FileWriter('log', sess.graph)
config = projector.ProjectorConfig()
embedding_conf = config.embeddings.add()
embedding_conf.tensor_name = 'embedding:0'
embedding_conf.metadata_path = os.path.join('log', 'metadata.tsv')
projector.visualize_embeddings(summary_writer, config)

# save the model
saver = tf.train.Saver()
saver.save(sess, os.path.join('log', "model.ckpt"))

Then run this command in your terminal:

tensorboard --logdir=log
Samir
  • 285
  • 1
  • 6
  • 9
  • 1
    To complete the journey, install `jupyter-tensorboard` to call tensorboard directly from Jupyter Notebook. – cylim Apr 13 '18 at 14:24
14

It sounds like you want to get the Visualization section with t-SNE running on TensorBoard. As you've described, the API of Tensorflow has only provided the bare essential commands in the how-to document.

I’ve uploaded my working solution with the MNIST dataset to my GitHub repo.

Yes, it is broken down into three general steps:

  1. Create metadata for each dimension.
  2. Associate images with each dimension.
  3. Load the data into TensorFlow and save the embeddings in a LOG_DIR.

Only generic details are inculded with the TensorFlow r0.12 release. There is no full code example that I’m aware of within the official source code.

I found that there were two tasks involved that were not documented in the how to.

  1. Preparing the data from the source
  2. Loading the data into a tf.Variable

While TensorFlow is designed for the use of GPUs, in this situation I opted to generate the t-SNE visualization with the CPU as the process took up more memory than my MacBookPro GPU has access to. API access to the MNIST dataset is included with TensorFlow, so I used that. The MNIST data comes as a structured a numpy array. Using the tf.stack function enables this dataset to be stacked into a list of tensors which can be embedded into a visualization. The following code contains is how I extracted the data and setup the TensorFlow embedding variable.

with tf.device("/cpu:0"):
    embedding = tf.Variable(tf.stack(mnist.test.images[:FLAGS.max_steps], axis=0), trainable=False, name='embedding')

Creating the metadata file was perfomed with the slicing of a numpy array.

def save_metadata(file):
    with open(file, 'w') as f:
        for i in range(FLAGS.max_steps):
            c = np.nonzero(mnist.test.labels[::1])[1:][0][i]
            f.write('{}\n'.format(c))

Having an image file to associate with is as described in the how-to. I've uploaded a png file of the first 10,000 MNIST images to my GitHub.

So far TensorFlow works beautifully for me, it’s computationaly quick, well documented and the API appears to be functionally complete for anything I am about to do for the moment. I look forward to generating some more visualizations with custom datasets over the coming year. This post was edited from my blog. Best of luck to you, please let me know how it goes. :)

norman_h
  • 911
  • 9
  • 18
  • 1
    Thanks @norman_h, i will check your code and come back :). I'm not working with images but with csv text for data classfication. – Patrick Dec 21 '16 at 16:14
  • @Patrick then I guess you'll just leave out the lines that deal with the sprites and build your `metadata.tsv` slightly differently. – norman_h Dec 21 '16 at 17:08
  • When I try to run tensorboard with your generated model, metadata etc. nothing shows in the GUI. It's just blank. I'm using TF 0.12.0-rc1. Are you missing the `model_checkpoint_path` in the `projector_config.pbtxt` file? – Nicolai Anton Lynnerup Mar 18 '17 at 09:49
  • Upgrade to TensorFlow 1.0 or try an old commit that works with tf0.12.0 https://github.com/normanheckscher/mnist-tensorboard-embeddings/tree/5ae407e5d0d6b49a93b6e9177f1cda81b2828162 – norman_h Mar 18 '17 at 10:12
  • Image is there. Link doesn't 404. – norman_h Feb 21 '19 at 22:28
4

Check out this talk "Hands-on TensorBoard (TensorFlow Dev Summit 2017)" https://www.youtube.com/watch?v=eBbEDRsCmv4 It demonstrates TensorBoard embedding on the MNIST dataset.

Sample code and slides for the talk can be found here https://github.com/mamcgrath/TensorBoard-TF-Dev-Summit-Tutorial

Malex
  • 49
  • 1
  • 4
  • Could you please provide the URL to the official tutorial? – Franck Dernoncourt Mar 08 '17 at 17:10
  • There is no code at the above link.. a few gists...is all I am looking for a working example of Tensorboard embedding visualization with t-sne/PCA that works with TF 1.0 so far no luck.. – dartdog Mar 13 '17 at 17:27
  • Have updated the link to the source code to use github. Should be easier to navigate. – Malex Mar 15 '17 at 13:14
3

An issue has been raised in the TensorFlow to GitHub repository: No real code example for using the tensorboard embedding tab #6322 (mirror).

It contains some interesting pointers.


If interested, some code that uses TensorBoard embeddings to display character and word embeddings: https://github.com/Franck-Dernoncourt/NeuroNER

Example:

enter image description here

enter image description here

FYI: How can I select which checkpoint to view in TensorBoard's embeddings tab?

Community
  • 1
  • 1
Franck Dernoncourt
  • 77,520
  • 72
  • 342
  • 501
  • Corresponding Github response https://github.com/tensorflow/tensorflow/issues/6322#issuecomment-298250484 – Eddie Apr 30 '17 at 19:11
1

The accepted answer was very helpful to understand the general sequence:

  1. Create metadata for each vector (sample)
  2. Associate images (sprites) with each vector
  3. Load the data into TensorFlow and save the embeddings using checkpoint and summary writer (mind that the paths are consistent throughout the process).

For me, the MNIST-based example still relied too much on pre-trained data and pre-generated sprite & metadata files. To fill this gap I created such an example myself and decided to share it here for anyone interested - the code is on GitHub.

enter image description here

Altermarkive
  • 91
  • 1
  • 2
1

To take pretrained embeddings and visualize it on tensorboard.

embedding -> trained embedding

metadata.tsv -> metadata information

max_size -> embedding.shape[0]

import tensorflow as tf
from tensorflow.contrib.tensorboard.plugins import projector

sess = tf.InteractiveSession()

with tf.device("/cpu:0"):
    tf_embedding = tf.Variable(embedding, trainable = False, name = "embedding")

tf.global_variables_initializer().run()
path = "tensorboard"
saver = tf.train.Saver()
writer = tf.summary.FileWriter(path, sess.graph)
config = projector.ProjectorConfig()
embed = config.embeddings.add()
embed.tensor_name = "embedding"
embed.metadata_path = "metadata.tsv"
projector.visualize_embeddings(writer, config)
saver.save(sess, path+'/model.ckpt' , global_step=max_size )

$ tensorboard --logdir="tensorboard" --port=8080

Prakhar Agarwal
  • 2,724
  • 28
  • 31