How is the window size affect word2vec and how do we choose window size according to different tasks?

Question

For example, if I choose two window size, 5 and 50, and train the word2vec model, will the 50 one takes more time to train? Will the embeddings of the 50 one concentrates more on semantics of the text and the 5 one concentrates more on single word? BTW, above two questions are just my thinking/exmaples of what I am seeking. My real question is just the title "How is the window size affect word2vec and how do we choose window size according to different tasks?"

possible duplicate of https://stackoverflow.com/questions/22272370/word2vec-effect-of-window-size-used — Manuel Alves, Sep 21 '21 at 10:25

gojomo · Accepted Answer · 2020-12-24T18:54:28.550

A larger window will take longer to train.

A larger window will have a stronger effect on runtime in 'skip-gram' mode, where a larger window means more individual center-word predictions & error-backpropagations. It'll have a milder effect on runtime in 'CBOW' mode, where it just means more averaging of input-vectors and fan-out of the final effects for each prediction/backpropagation.

For how it affects the character of the resulting word-vectors, there's some discussion & a related research paper in a prior answer: Word2Vec: Effect of window size used

Generally, you'd optimize the window value the same as any other tunable parameter, by devising some repeatable way to score the final word-vectors on your real task (or a close/correlated simulation), then trying a range of values to see which scores best on your evaluation.

Great explanation, Thank you so much! – neese Dec 24 '20 at 09:47 — neese, Dec 24 '20 at 09:47

How is the window size affect word2vec and how do we choose window size according to different tasks?

1 Answers1