I'm trying to create an RNN that would predict the next word, given the previous word. But I'm struggling with modeling this into a dataset, specifically, how to indicate the next word to be predicted as a 'label'.
I could use a hot label encoded vector for each word in the vocabulary, but a) it'll have tens of thousands of dimensions, given the large vocabulary, b) I'll lose all the other info contained in the word vector. Perhaps that info would be useful in calculating the error, i.e how off the predictions were from the actual word.
What should I do? Should I just use the one hot encoded vector?