I have a set of documents in a df. I am transforming those documents to vectors with gensim
Doc2Vec
:
def read_corpus(documents):
for i, plot in enumerate(documents):
yield gensim.models.doc2vec.TaggedDocument(gensim.utils.simple_preprocess(plot, max_len=30), [i])
train_corpus = list(read_corpus(df.note))
model = gensim.models.doc2vec.Doc2Vec(size=50, min_count=2, iter=55)
model.build_vocab(train_corpus)
model.train(train_corpus, total_examples=model.corpus_count, epochs=model.iter)
I then save the model and convert the .w2v file to tsv files. Finally, I overwrite the metadata tsv file so that it is meaningful:
model.save_word2vec_format('doc_tensor.w2v', doctag_vec=True, word_vec=False)
%run word2vec2tensor.py -i doc_tensor.w2v -o notes
with open('notes_metadata.tsv','w') as w:
w.write('created_by\tnote_type\n')
for i,j in zip(df.created_by, df.note_type):
w.write("%s\t%s\n" % (i,j))
At this point, I have two tsv files: one contains the vectors and the other contains metadata about the vectors. I got to this point by following this tutorial: http://nbviewer.jupyter.org/github/RaRe-Technologies/gensim/blob/8f7c9ff4c546f84d42c220dcf28543500747c171/docs/notebooks/Tensorboard_visualizations.ipynb#Training-the-Doc2Vec-Model.
Now I would like to embed this model and the tsv files in a local Tensorboard. I tried this:
# load model
embedding = model.docvecs.vectors_docs
# setup a TensorFlow session
tf.reset_default_graph()
sess = tf.InteractiveSession()
X = tf.Variable([0.0], name='embedding')
place = tf.placeholder(tf.float32, shape=embedding.shape)
set_x = tf.assign(X, place, validate_shape=False)
sess.run(tf.global_variables_initializer())
sess.run(set_x, feed_dict={place: embedding})
# create a TensorFlow summary writer
summary_writer = tf.summary.FileWriter('log', sess.graph)
config = projector.ProjectorConfig()
embedding_conf = config.embeddings.add()
embedding_conf.tensor_name = 'embedding:0'
embedding_conf.metadata_path = os.path.join('log', 'metadata.tsv')
projector.visualize_embeddings(summary_writer, config)
This code ran without error, but when I type tensorboard --logdir=log
and go to the localhost location, it looks like this:
My folder structure looks like this:
project
- jupyter_notebook_from_which_I_run_my_code.ipynb
- log
- events.out.tfevents.1519305293.COMPUTERNAME
- notes_metadata.tsv
- notes_tensor.tsv
- projector_config.pbtxt
If I click "Choose File" in the TensorBoard projector and choose my notes_tensor.tsv file, it says "Graph visualization failed: The graph is empty. Make sure that the graph is passed to the tf.summary.FileWriter after the graph is defined.
How do I get the tsv files to show up in the projector for tSNE and PCA visualizations like in the tutorial I linked to earlier?
Update: I tried adding these two lines:
saver = tf.train.Saver([X])
saver.save(sess, os.path.join('log', 'model2.ckpt'), 1)
This added these files to log
:
checkpoint
model2.ckpt-1.data-00000-of-00001
model2.ckpt-1.index
model2.ckpt-1.meta
It also gave TensorBoard the Projector
tab!
However, there is an error fetching metadata.tsv. This is because it doesn't exist. It's also looking in /log/log instead of just /log. When I dismiss that error, click "Load", and choose notes_metadata.tsv, nothing happens.