Should I use word2vec to do word embedding including testing data?

Question

I am a new people in NLP and I am try do the text classification job. Before doing the job, I know that we should do word embedding. My question is should I do word embedding job only on training data (so that testing data get vector just from pre-trained vec-model of training data), or both on training data & testing data?

score -1 · Accepted Answer · answered May 23 '16 at 03:21

-1

This is a very important question. In NN community what typically people do is to use a threshold (i.e. frequency < = 2) in the training set and replace all words which occur less than that threshold by UNK token. Then in the test time, if there is a word that doesn't match an actual training set word, UNK's representation will replace it.

answered May 23 '16 at 03:21

user3639557

4,791
6
30
55

This in no way answers the question. – spectre Jan 28 '22 at 16:51

Should I use word2vec to do word embedding including testing data?

1 Answers1