I am trying to use Doc2Vec
to convert sentences to vectors, then use those vectors to train a tensorflow classifier.
I am a little confused at what tags are used for, and how to extract all of the document vectors from Doc2Vec
after it has finished training.
My code so far is as follows:
fake_data = pd.read_csv('./sentences/fake.txt', sep='\n')
real_data = pd.read_csv('./sentences/real.txt', sep='\n')
sentences = []
for i, row in fake_data.iterrows():
sentences.append(TaggedDocument(row['title'].lower().split(), ['fake', len(sentences)]))
for i, row in real_data.iterrows():
sentences.append(TaggedDocument(row['title'].lower().split(), ['real', len(sentences)]))
model = gensim.models.Doc2Vec(sentences)
I get vectors when I do print(model.docvecs[1])
etc, but they are different every time I remake the model.
First of all: have I used Doc2Vec
correctly?
Second: Is there a way I can grab all documents tagged 'real' or 'fake', then turn them into a numpy array and pass it into tensorflow?