Embedding Gensim Doc2Vec Tensorboard

Question

I have a set of documents in a df. I am transforming those documents to vectors with gensim Doc2Vec:

def read_corpus(documents):
    for i, plot in enumerate(documents):
        yield gensim.models.doc2vec.TaggedDocument(gensim.utils.simple_preprocess(plot, max_len=30), [i])

train_corpus = list(read_corpus(df.note))

model = gensim.models.doc2vec.Doc2Vec(size=50, min_count=2, iter=55)
model.build_vocab(train_corpus)
model.train(train_corpus, total_examples=model.corpus_count, epochs=model.iter)

I then save the model and convert the .w2v file to tsv files. Finally, I overwrite the metadata tsv file so that it is meaningful:

model.save_word2vec_format('doc_tensor.w2v', doctag_vec=True, word_vec=False)

%run word2vec2tensor.py -i doc_tensor.w2v -o notes 

with open('notes_metadata.tsv','w') as w:
    w.write('created_by\tnote_type\n')
    for i,j in zip(df.created_by, df.note_type):
        w.write("%s\t%s\n" % (i,j))

At this point, I have two tsv files: one contains the vectors and the other contains metadata about the vectors. I got to this point by following this tutorial: http://nbviewer.jupyter.org/github/RaRe-Technologies/gensim/blob/8f7c9ff4c546f84d42c220dcf28543500747c171/docs/notebooks/Tensorboard_visualizations.ipynb#Training-the-Doc2Vec-Model.

Now I would like to embed this model and the tsv files in a local Tensorboard. I tried this:

# load model
embedding = model.docvecs.vectors_docs

# setup a TensorFlow session
tf.reset_default_graph()
sess = tf.InteractiveSession()
X = tf.Variable([0.0], name='embedding')
place = tf.placeholder(tf.float32, shape=embedding.shape)
set_x = tf.assign(X, place, validate_shape=False)
sess.run(tf.global_variables_initializer())
sess.run(set_x, feed_dict={place: embedding})

# create a TensorFlow summary writer
summary_writer = tf.summary.FileWriter('log', sess.graph)
config = projector.ProjectorConfig()
embedding_conf = config.embeddings.add()
embedding_conf.tensor_name = 'embedding:0'
embedding_conf.metadata_path = os.path.join('log', 'metadata.tsv')
projector.visualize_embeddings(summary_writer, config)

This code ran without error, but when I type tensorboard --logdir=log and go to the localhost location, it looks like this:

My folder structure looks like this:

project
   - jupyter_notebook_from_which_I_run_my_code.ipynb
   - log
        - events.out.tfevents.1519305293.COMPUTERNAME
        - notes_metadata.tsv
        - notes_tensor.tsv
        - projector_config.pbtxt

If I click "Choose File" in the TensorBoard projector and choose my notes_tensor.tsv file, it says "Graph visualization failed: The graph is empty. Make sure that the graph is passed to the tf.summary.FileWriter after the graph is defined.

How do I get the tsv files to show up in the projector for tSNE and PCA visualizations like in the tutorial I linked to earlier?

Update: I tried adding these two lines:

saver = tf.train.Saver([X])
saver.save(sess, os.path.join('log', 'model2.ckpt'), 1)

This added these files to log:

checkpoint
model2.ckpt-1.data-00000-of-00001
model2.ckpt-1.index
model2.ckpt-1.meta

It also gave TensorBoard the Projector tab!

However, there is an error fetching metadata.tsv. This is because it doesn't exist. It's also looking in /log/log instead of just /log. When I dismiss that error, click "Load", and choose notes_metadata.tsv, nothing happens.

From what folder are you running `tensorboard --logdir=log` ? — Thomas Fauskanger, Feb 22 '18 at 15:40
the root folder where my .ipynb file is. i got everything to work briefly, but now i am running into unicodedecode errors on the metadata — OverflowingTheGlass, Feb 22 '18 at 22:10

Embedding Gensim Doc2Vec Tensorboard

0 Answers0