I am working on text summarization and to build my vocabulary, I have trained a dataset. Now I need vectors of those vocab words from Google's Word2Vec. I've written simple code that takes each word and searches for it in the google-vectors file that contains around 3 million words. But the problem is, that this sort of linear searching would literally take weeks to compute. I am using python for this thing. How can I search for these words in a more efficient manner?
found_counter = 0
file1 = open('vocab_training.txt', 'r').read()
for i, line in enumerate(file1):
if i >= 50:
break
file2 = open('google-vectors.txt', 'r' )
for j, line2 in enumerate(file2):
if line.lower() == line2.split():
found_counter += 1
file2.close()
print(found_counter)